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ABSTRACT 


Research  on  optical  data  processing  for  missile  guidance  and  robotics  is  described.  Components 
addressed  include  acousto-optic  cells.  Pattern  recognition  work  includes  feature  extraction  (Fourier 
coefficients  and  moments)  and  correlation  (using  synthetic  discriminant  functions).  All  pattern 
recognition  work  concerns  multi-class  distortion-invariant  pattern  recognition.  Optical  linear  algebra 
processors  are  addressed  with  attention  to:  algorithms,  architectures,  applications,  Kalman  filtering, 
system  fabrication,  accuracy  and  performance,  plus  error  source  modeling  and  simulation. 


1.  INTRODUCTION 

During  the  past  year  (September  1983  -  September  1984),  our  research  in  optical  data  processing  for 
missile  guidance  has  addressed  many  of  the  key  issues  and  aspects  of  this  technology.  This  research 
includes:  real-time  devices  and  components,  new  system  architectures,  new  high-speed  general  purpose 
optical  data  processing  techniques  and  systems,  tests  on  new  image  data  bases,  basic  studies  of  existing 
pattern  recognition  architectures,  and  new  pattern  recognition  techniques,  algorithms  and  concepts.  As  in 
past  years,  we  have  been  quite  faithful  in  reporting  our  AFOSR  sponsored  research  in  various  journals 
and  conference  publications.  Copies  of  the  more  relevant  papers  we  have  published  over  the  past  year  are 
included  as  chapters  of  this  report  to  provide  complete  documentation  of  each  aspect  of  our  work. 

In  Chapter  2,  we  provide  a  summary  and  overview  of  our  research  progress  achieved  over  the  past 
year.  This  work  addresses  five  vital  areas  of  optical  data  processing  research: 

1.  real-time  spatial  light  modulators  (Section  2.2  and  Chapter  3), 

2.  optical  pattern  recognition  (Section  2.3  and  Chapter  4), 

3.  optical  feature  extraction  (Section  2.4  and  Chapters  5-7), 

4.  optical  correlation  (Section  2.5  and  Chapter  8),  and 

5.  optical  linear  algebra  processors  (Section  2.6  and  Chapters  9-14). 

Topic  (1)  concerns  the  vital  issue  of  real-time  spatial  light  modulators.  Topics  (2)-(4)  address 
pattern  recognition  for  ATR  using  optical  pattern  recognition  (OPR)  techniques.  In  this  work,  we  have 
been  faithful  to  address  vital  problems  such  as  multi-class  distortion-invariant  pattern  recognition  of 
military  targets,  the  acquisition  and  importance  of  a  large  data  base,  and  the  effect  of  noise  on  the 
algorithm  used.  Topic  (5)  concerns  the  most  attractive  item  in  optical  processing  at  present  and  a 
potentially  quite  general-purpose  optical  processor. 

Details  on  the  more  salient  results  of  our  research  are  provided  in  Chapters  3-14.  References  are 
included  in  Chapter  15.  In  Chapter  16,  we  enumerate  our  AFOSR  sponsored  publications,  the 
presentations  given  on  this  research  at  conferences  and  seminars  during  the  past  year,  and  the  Master’s 
and  PhD  students  that  this  grant  has  supported. 


During  the  past  year,  the  principal  investigator  (PI)  presented  invited  talks  on  our  AFOSR 
sponsored  research  at  various  conferences  including  the  Critical  Review  of  Technology  SPIE  Conference 
on  Optical  Computing  (SPIE,  Los  Angeles,  CA,  January  1984)  and  the  DoD  conference  on  Parallel 
Algorithms  and  Architectures  for  ATR  (Leesburg,  VA,  July  1984)  and  various  optical  computing  and 
robotics  conferences  during  the  past  year.  The  PI  has  chaired  conference  sessions  and  seminars  and 
served  on  the  organizing  committees  for  the  following  conferences  and  topics:  SPIE  (Robotics),  IOCC 
(Optical  Computing),  ICALEO  (Optical  Data  Processing).  One  of  this  major  papers  in  1984  was  an 
invited  paper  on  optical  linear  algebra  processors  for  the  July  1984  Proc.  IEEE  Special  Issue  on  Optical 
Computing. 


2.  OVERVIEW  AND  SUMMARY 


2.1  INTRODUCTION 

Our  five  major  research  areas  and  our  recent  progress  in  each  are  highlighted  in  Sections  2.2  -  2.6. 
Details  of  each  aspect  of  our  thirteen  work  topics  follows  in  Chapters  3-14. 

2.2  SPATIAL  LIGHT  MODULATORS  (ACOUSTO-OPTIC  CELLS. 
CHAPTER  3) 

Recently,  our  spatial  light  modulator  research  has  emphasized  acousto-optic  cells.  In  Chapter  3,  we 
discuss  recent  new  work  in  this  area  [l] .  We  have  considered  the  salient  acousto-optic  architectures 
(spectrum  analyzers  and  correlators).  The  various  acousto-optic  cell  and  acousto-optic  architecture 
component  errors  have  been  enumerated,  grouped  into  different  classes  and  combined  into  several  new 
models.  New  performance  measures  for  acousto-optic  correlators  and  spectrum  analyzers  were  defined 
and  detailed  (spectrum  estimation,  delay  estimation,  and  detection).  Each  is  an  appropriate  performance 
measure  for  a  different  application.  General  error-free  formulae  for  each  of  these  performance  measures 
were  derived  and  the  performance  obtained  with  each  was  described  and  quantified  as  a  function  of  the 
various  system  parameters.  Our  future  work  in  this  area  will  include  component  error  source  effects  on 
performance,  the  relationship  of  these  models  to  optical  linear  algebra  processors  and  tests  on  multi¬ 
channel  acousto-optic  cells. 

2.3  OPTICAL  PATTERN  RECOGNITION  REVIEWS  (CHAPTER  4) 

Our  AFOSR  optical  pattern  recognition  research  is  at  the  forefront.  Our  paper  [2]  on  coherent 
optical  pattern  recognition  was  included  in  the  recent  Critical  Review  of  Technology  series  on  optical 
computing.  A  more  recent  review  [2]  was  the  only  optical  pattern  recognition  paper  at  a  DoD  conference 
on  parallel  architectures  and  algorithms  for  ATR.  For  completeness  and  as  an  introduction  and  overview, 
we  summarize  recent  coherent  optical  pattern  recognition  research.  A  full  length  journal  paper  on  this 
topic  with  extensive  references  is  expected  to  be  in  the  Optical  Engineering  -  Special  Iteue  on  Optical 
Computing  [23]  in  January  1985  and  will  be  included  in  our  1985  report.  Chapter  4  reviews  optical 
techniques  for  feature  extraction  and  correlation,  new  algorithms,  architectures  and  hybrid  optical/digital 
concepts  [2], 
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2.4  OPTICAL  PATTERN  RECOGNITION  FEATURE  EXTRACTION 


(CHAPTERS  5  -  7) 

Two  new  optical  feature  extraction  techniques  are  detailed:  the  use  of  new  feature  extractors  and 
dimensionality  reduction  techiques  on  a  wedge  ring  detector-sampled  optically-produced  feature  space 
(Chapter  5)  and  a  hierarchical  two-level  hybrid  optical/digital  moment  feature  processor  (Chapters  6  and 
7).  Our  earlier  conference  paper  [4]  on  an  optical  Fourier  coefficient  feature  space  has  been  improved  and 
expanded  into  a  journal  paper  [5]  for  a  special  issue  on  robot  vision.  In  Chapter  5  [4],  this  work  is 
summarized.  It  includes  four  different  dimensionality  reduction  and  feature  extraction  techniques,  a  new 
classifier  concept,  quantitative  data  on  the  importance  of  amplitu  .ms  p..ase  Fourier  coefficients  (for 
pattern  recognition,  rather  than  image  reconstruction)  and  the  performance  of  each  in  the  presence  of 
noise.  Experimental  results  for  two  letters  and  two  vehicles  with  25  images  of  each  at  different  scale  and 
in-plane  rotational  differences  were  obtained.  In  Chapter  6,  our  new  hybrid  optical/digital  moment 
processor,  a  new  hierarchical  class  estimator,  and  a  new  two-level  classifier  are  detailed  and  results 
obtained  on  a  set  of  over  300  robot  objects  (pipe  parts)  [6].  New  quantitative  and  analysis  data  for  our 
ship  image  data  base  will  shortly  be  published  [7],  The  performance  of  the  system  on  non-controlled 
imagery  and  the  necessary  pre-processing  required  are  included  [8]  in  Chapter  7.  Our  future  work  will 
involve  laboratory  optical  Fourier  coefficient  research,  new  theoretical  and  optical  laboratory  work  on 
chord  distributions,  fundamental  work  on  training  set  selection,  laboratory  optical  moment  system 
fabrication,  generic  object  recognition  using  optical  feature  extractors  and  synthetic  filters.  Our  feature 
extraction  work  will  continue  to  address  distortion-invariant  multi-class  object  recognition  and 
performance  in  the  presence  of  noise. 

2.5  OPTICAL  PATTERN  RECOGNITION  CORRELATORS  (CHAPTER  8 


Our  distortion-invariant  multi-class  multi-object  correlator  research  emphasizes  synthetic 
discriminant  functions  (SDFs).  The  basic  SDF  synthesis  algorithms  have  now  been  unified  (9)  (Chapter 
8).  Our  tests  of  projection  SDFs  on  ship  images  with  data  on  noise  performance  and  guidelines  for  the 
selection  of  projection  values  are  expected  to  appear  [10]  in  a  special  journal  issue  on  pattern  recognition 
late  this  year.  These  results  will  be  included  in  our  1985  Final  Report.  Three  new  types  of  SDFs  have 
been  devised  and  initial  results  with  them  have  been  obtained  for  a  tank  and  APC  image  data  base  [11]. 
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These  details  will  be  available  shortly  and  will  be  included  in  our  1985  report  together  with  initial  results 
on  linear  functional  (optimal  linear  discriminant  functions)  SDFs.  Laboratory  experimental  data,  system 
fabrication  concepts  and  optical  matched  spatial  filter  work  will  be  major  future  work  issues  together 
with  various  extensions  of  new  SDFs  and  their  applications  to  different  correlation  pattern  recognition 
ATR  data  bases. 

2.6  OPTICAL  LINEAR  ALGEBRA  PROCESSORS  (CHAPTERS  9  -  14) 

This  optical  data  processing  application  area  has  received  very  much  recent  attention. 

Our  recent  work  in  this  area  has  included  extensions  of  previous  LU  and  other  direct  matrix 
decomposition  algorithms  and  architectures  and  new  algorithms  and  architectures  for  back-substitution 
and  the  solution  of  triangular  systems  of  LAEs  (linear  algebraic  equations).  Most  recently,  a  parallel  QR 
algorithm  and  its  implementation  were  detailed  by  us  (12,13).  This  completes  the  major  algorithm  optical 
realization  work  on  direct  and  indirect  linear  algebra  solutions  to  systems  of  LAEs.  A  recent  special  issue 
of  the  Proc.  IEEE  on  optical  computing  summarizes  our  architecture,  algorithm,  data  flow  and  selected 
applications  research  on  optical  linear  algebra  processors.  Chapter  9  details  this  work  [14].  It  is 
extremely  noteworthy  since  one  optical  linear  algebra  processor  system  can  achieve  all  n  cessary 
operations  by  format  control. 

A  second  vital  aspect  of  optical  linear  algebra  research  that  we  initiated  was  the  error  source 
modeling  and  simulation  of  OLAP  (optical  linear  algebra  processor)  architectures  and  algorithms  (15). 
Chapter  10  details  this  work  (15)  and  our  initial  results  using  it  in  the  comparison  of  direct  and  iterative 
solutions  of  LAEs  on  OLAPs.  A  third  facet  of  our  OLAP  research  has  concerned  specific  applications. 
The  operation  chosen  for  major  attention  was  Kalman  filtering  and  the  specific  application  of  it  was 
missile  guidance  and  control.  In  [16]  we  first  advanced  the  details  of  a  general  Kalman  filter  realization 
on  one  type  of  OLAP.  Chapter  11  details  this  work  fully  (17).  A  new  architecture  for  Kalman  filtering 
when  noise  statistics  are  known  [18]  was  also  devised.  It  is  detailed  in  Chapter  12.  New  algorithms  and 
optical  architectures  and  accuracy  issues  concerning  this  and  other  applications  will  be  available  in  1985 
as  we  unify  our  algorithm,  architecture,  modeling  and  simulation  research  on  this  application.  We  have 
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detailed  the  use  of  residue  arithmetic  in  OLAPs  [19]  to  achieve  increased  accuracy  and  have  found  other 
methods  to  be  preferable  to  the  use  of  residue  arithmetic. 

The  major  linear  algebra  operation  required  in  Kalman  filtering  is  the  solution  of  a  nonlinear 
quadratic  matrix  equation.  We  have  devised  a  new  algorithm  to  achieve  this  using  a  fixed  number  of 
iterations.  We  have  quantified  all  operational  parameters  for  the  algorithm,  simulated  several  solutions 
of  it  using  different  algorithms,  assessed  the  effect  of  different  optical  system  errors,  the  dominant  optical 
system  errors,  and  the  effect  of  multiple  errors  as  well  as  quantified  the  performance  of  the  algorithm  and 
provided  a  laboratory  OLAP  demonstration  of  it.  This  work  is  detailed  [20]  in  Chapter  13. 

The  fourth  and  final  aspect  of  our  OLAP  research  has  been  attention  to  fabrication  of  an  OLAP. 
We  recently  [21]  clarified  that  the  number  of  operations  acheivable  on  our  frequency-multiplexed 
processor  is  comparable  to  others  and  showed  its  equality  and  that  it  is  preferable  from  a  fabrication 
standpoint.  We  also  detailed  4-5  different  techniques  for  fabrication  of  such  a  system  and  provided  the 
first  initial  laboratory  experimental  data  on  the  performance  and  operation  of  an  optical  systolic 
processor.  These  results  are  highlighted  in  Chapter  14.  In  1985,  we  expect  significant  laboratory  OLAP 
results  to  emerge.  Many  applications  for  OLAPs  exist.  Reference  [14]  details  several  others  and  reference 
[22]  discusses  their  use  in  pattern  recognition. 


3.  TIME-INTEGRATING  ACOUSTO-OPTIC 
CORRELATOR:  ERROR  SOURCE 
MODELING 


Time-integrating  acoustooptic  correlator: 
error  source  modeling 

David  Casasent,  Anastasios  Goutzoulis,  and  B.  V.  K.  Vijaya  Kumar 


The  error  sources  present  in  a  time-integrating  acoustooptic  correlator  are  considered.  They  are  classified 
and  modeled  into  three  categories:  input  plane  errors;  frequency  plane  errors;  and  detector  plane  errors 
To  facilitate  error  analyses,  performance  measures  are  defined  and  quantified  for  an  error-free  system  for 
detection  and  delav  estimation  applications. 


I.  Introduction 

Optical  signal  processors  provide  real-time  operations 
on  high  bandwidth  and  time-bandwidth  product  data 
of  long  duration  and  high  center  frequency.  These 
features  plus  the  rapidly  maturing  commercial  avail¬ 
ability  of  acoustooptic  components  are  the  major  rea¬ 
sons  optical  signal  processors  (OSPs)  employing  ac¬ 
oustooptic  devices  have  recently  received  considerable 
attention.1-6  These  acoustooptic  systems  offer  a  most 
attractive  approach  to  signal  processing  problems  in 
which  data  with  high  time  bandwidths  and  variable 
codes  must  be  processed. 

Acoustooptic  (AO)  devices  can  be  incorporated  into 
various  architectures.  These  OSP  systems1'4  can  be 
divided  into  two  general  classes:  (1)  correlators  and  (2) 
spectrum  analyzers.  Both  system  classes  can  be  real¬ 
ized  by  performing  the  necessary  integration  in  space 
or  in  time.1 

Time-integrating  (TI)  processors7  have  received 
considerable  attention  because  they  can  accommodate 
extremely  large  time-bandwidth  (TBW)  product  data 
and  because  many  new  and  attractive  TI  algorithms8 
and  architectures9  exist.  An  important  feature  of  TI 
processors  is  their  ability  to  operate  on  signals  with  very 
large  TBW  product  with  the  ability  to  change  (on-line) 
the  signal  code  being  processed. 

Despite  the  rapidly  increasing  use  of  AO  devices,  little 
attention  (in  the  literature)  has  been  given  to  the 
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modeling  of  the  components  of  such  systems,  to  the 
effect  various  system  parameters  and  component  error 
sources  have  on  the  performance  of  these  systems,  and 
to  the  performance  measures  used  to  describe,  analyze, 
and  design  such  processors.  In  this  paper,  we  advance 
the  first  such  formulation  for  TI  bulk  AO  correlators. 
In  Sec.  II,  we  briefly  review  the  operation  of  a  TI  AO 
correlator.  Our  categorization  and  modeling  of  the 
various  error  sources  are  given  in  Sec.  Ill,  and  their 
enumeration  and  origin  are  then  presented  in  Sec. 
IV. 

In  Sec.  V,  we  discuss  the  performance  measures  we 
chose  to  describe  the  accuracy  and  performance  of  the 
TI  AO  correlator.  We  consider  two  different  correlator 
applications  (detection  and  delay  estimation)  and  em¬ 
ploy  different  performance  measures  for  each.  The 
basic  error-free  analyses  for  a  TI  AO  correlator  for  de¬ 
tection  and  delay  estimation  are  then  presented  (Secs. 
VI  and  VII).  These  analyses  provide  the  basic  statis¬ 
tical  framework  for  further  analyses  that  include  and 
quantify  the  effects  of  the  various  error  sources.  Such 
analyses  will  be  the  subject  of  future  publications. 

II.  Signal  Correlation  with  TI  AO  Correlators 

The  basic  operation  of  a  TI  AO  correlator  is  explained 
with  the  aid  of  Fig.  1.  The  signals  to  be  correlated  are 
s„(f)  and  «{,(<).  sj,(t)  is  usually  a  delayed  version  of 
s„(t)  and  includes  some  additive  noise  n{t).  For  linear 
intensity  modulation9  of  the  AO  cells,  the  signals  are 
added  to  two  biases  B\  and  B2  and  used  to  modulate  the 
amplitude  of  an  rf  carrier.  Thus  the  baseband  electrical 
inputs  to  the  laser  diode  (or  other  input  point  modula¬ 
tor)  and  the  AO  cell  are 

«i(t)  *  |B|  +s.(r)|,  O) 

#2<r)  -  +  »»«)!.  (2i 

The  intensity  of  the  data  portion  of  the  light  leaving 
plane  Pi  is  proportional  to 
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Fig  1.  Schematic  diagram  of  a  time-integrating  acoustooptic 
correlator. 


/i(r)-Bs  +  s»(r).  (3) 

This  light  beam  is  expanded  by  lens  Lj  and  uniformly 
illuminates  the  AO  cell  at  plane  P2.  Thus  Eq.  (3)  also 
describes  the  light  intensity  incident  on  Pi-  Note  that 
it  varies  only  in  time  and  and  not  spatially.  Lenses  Li, 
L3  and  the  spatial  filter  at  P3  separate  the  undiffracted 
and  diffracted  orders,  block  the  undiffracted  order  light, 
and  image  the  first -order  diffracted  light  onto  a  detector 
array  at  plane  P4.  Denoting  the  detector’s  time  con¬ 
stant  of  integration  by  T/,  the  final  detected  output  at 
Pa  is  (including  all  bias  terms) 

/•Til  2 

/4(t>  -=  (1/7/)  (  (B2 +  «(,<*  )|(B,  +  sa<f  -  (4) 

J -Till 

where  r  =  x/u,  x  is  the  direction  of  sound  propagation 
in  the  AO  cell,  and  v  is  the  velocity  of  sound  in  the  AO 
crystal.  The  second  term  in  Eq.  (4)  is  the  modulation 
on  the  first-order  term  in  the  transmittance  of  Pi- 
Equation  (4)  can  be  further  simplified  to 

(•Til  2 

/«(r)-B  +  B.  +  (l/r,)  |  s„(t)saU  -  T)dt,  (5) 
J-Tin 

which  is  recognized  as  the  desired  correlation  (last  term) 
on  a  signal-independent  bias  B  and  a  signal-dependent 
bias  Bs  with  both  temporal  and  spatial  dependence. 

Many  AO  architectures  exist9  that  utilize  amplitude 
rather  than  intensity  modulation  of  the  AO  cell.  In 
such  cases,  the  detector  output  has  the  general  form9 

r  Tin 

/4(t >  *  fl  +  Bs  +  (l/Tpm  cos(2ir/or)  I  «»(<)*■(/- r)dt, 

J-Tll  2 

(6) 

where  B  is  a  bias,  Bs  is  a  signal-dependent  bias  with 
both  temporal  and  spatial  variation,  m  is  a  constant,  and 
cos(2ir/0r)  is  a  spatial  carrier  where  fo  is  the  frequency 
of  a  reference  rf  oscillator  purposely  included  with  the 
input  signals.  This  electronic  reference  allows  the 
correlation  term  to  be  separated  from  the  bias  terms  by 
bandpass  filtering  the  electrically  readout  version  of  Eq. 
(6). 

The  intensity  and  amplitude  modulation  modes  for 
TI  AO  correlators  have  many  well-known9  advantages 
and  disadvantages.  Our  initial  objective  is  to  model  the 
various  component  error  sources  and  system  parameters 
so  that  our  results  are  appropriate  for  both  modulation 
modes.  To  achieve  this,  we  consider  the  intensity 
modulation  scheme,  which  for  B  =  Bs  *  0  is  equivalent 
to  the  amplitude  modulation  scheme  (after  the  neces¬ 
sary  postprocessing  filtering). 


III.  Error  Source  Modeling 

In  this  section,  we  describe  the  mathematical  models 
we  use  to  describe  the  various  types  of  component  error 
in  an  AO  Tl  correlator.  The  ideal  model  would  express 
the  system’s  output  as  a  function  of  all  the  error  pa¬ 
rameters;  however,  because  of  the  number  of  error 
sources  and  their  nature,  such  a  model  cannot  be  ana¬ 
lyzed  statistically.  Thus,  we  propose  to  model,  study, 
and  quantify  several  independent  classes  of  errors  and 
to  determine  the  lower  bound  of  the  system’s  perfor¬ 
mance  for  each  error  class  independently.  We  thus 
include  three  classes  of  error  distinguished  by  whether 
their  effects  are  modeled  in  the  input,  frequency,  or 
output  detector  plane.  Fortunately,  there  are  only  a 
few  error  sources  that  affect  more  than  one  error  plane. 
We  elaborate  on  these  errors  and  the  way  to  treat  them 
in  Sec.  IV. 

If  the  error  to  be  considered  occurs  in  the  input  plane 
Pi,  it  is  directly  mapped  onto  the  output  plane,  and  we 
thus  describe  it  by  including  a  multiplicative  weighting 
function  w(t )  in  the  processor’s  output,  Eq.  (5)  be¬ 
comes 

r Tin 

/(t)-U’(t)|B  +  Bs+  (l/T;)  j  SbUHait  -  T)dt].  (7) 
J-Til  2 

This  class  of  error  is  quite  unique,  since  it  maps  directly 
onto  the  output  plane.  The  effect  of  this  type  of  error 
is  local  rather  than  global  and  can  thus  be  corrected  by 
postdetection  processing. 

The  second  class  of  error  are  those  which  affect  the 
frequency  response  of  the  system.  (These  concern  the 
AO  cell  and  the  lenses.)  Such  error  sources  are  ob¬ 
viously  best  modeled  by  a  weighting  function  in  the 
frequency  plane.  Because  of  the  excellent  quality  of 
state-of-the-art  lenses  plus  the  fact  that  lens  effects  in 
optical  processors  have  been  studied  in  detail  else¬ 
where,10  we  restrict  our  attention  to  the  AO  cell  fre¬ 
quency  response.  With  this  in  mind,  we  include  the 
impulse  responses  h\(t)  and  h2(t)  of  the  AO  cell  and  the 
input  point  modulator.  With  these  factors  included, 
we  can  describe  the  output  of  the  processor  as 

/<t)  -  B  +  Bs+  (1  IT,)  jy  +  ”/i2(X2)st((  -  X2> 

X  /i)(\i)s0(f  -  r  -  'K\)d\\d\iit ,  (8) 

where  Bs  includes  the  effects  of  hi  and  h2  on  Bs-  To 
obtain  Eq.  (8),  we  used  the  fact  that  the  convolution  of 
the  input  signal  s{t )  with  the  system’s  impulse  response 
h(t)  can  be  written  as  fh(\)s(t  -  \)d\.  In  writing  Eq. 
(8),  we  also  assumed  that  the  AO  cell  is  operated  in  a 
linear  intensity  mode,  as  is  necessary  for  an  AO  corre¬ 
lator.  We  also  note  that  intermodulation  products  in 
a  TI  correlator  do  not  appear  in  specific  spatial  locations 
(as  is  the  case  for  space-integTating  spectrum  analyzers), 
but  rather  they  tend  to  be  uniformly  distributed  over 
the  output  plane,  and  thus  their  effect  is  rather 
small. 

For  most  statistical  analyses,  the  form  of  Eq.  (8)  is  not 
convenient.  This  is  because  extensive  convolutions 
have  to  be  evaluated  in  the  time  domain.  Such  a  task 
is  not  trivial  in  digital  simulators,  since  most  signals  of 
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interest  are  most  commonly  described  in  terms  of  their 
frequency  domain  characteristics.  It  is  thus  preferable 
that  this  class  of  e .  m  be  studied  in  the  frequency  do¬ 
main.  This  choi  .■  is  also  convenient  because  the 
transfer  functions  H{f)  *  7(/i(f )]  are  easily  measured 
for  the  two  real-time  devices  in  the  system.  To  express 
Eq.  (8)  in  the  frequency  domain,  i.e.,  in  terms  of  H(f), 
we  first  form  the  expected  value  of  Eq.  (8)  as  below: 

£tf( t»|  -  £|B|  +  E|fls| 

1  pTi!2  /*/*• 

+  ^Xr,/>JT->,X2)MX,) 

X  £|s„(f  —  t  —  Ai)si,U  —  A2)|rfA|dA2<^/.  (9) 

For  analytical  simplicity,  we  consider  the  case  of  t0  = 
0  (without  any  loss  of  generality).  Then  the  received 
signal  st,(f)  is  simply  s0(t)  +  n(t),  and  Eq.  (9)  be¬ 
comes 

1  nTtf 2 

£|/(r)|-B  +  £|Bsl  +  — J  r/2JJ  *atA*»fct(Ai) 

•  [£ts0 -  r  -  Ai)s„(l  -  A2)l 
+  £|s„(l  -  t  -  Ai)n(<  -  A2)]dAjdA2<it.  (10) 

Since  the  noise  n(t)  is  of  zero-mean  and  is  statistically 
independent  from  the  signal  sa(t),  Eq.  (10)  simplifies 
to 


1  /*Ti/2 

£|/<t)|  -  B  +  E|Bst  +  — J  ^  jj  MMMA,) 

•  R,(t  +  Aj  -  \2)d\id\2dt 
-  B  +  £|BSI  +  Jj"  /i2(A2)A,(A,) 

X  R,{t  +  Ai  —  \2)d\\d\2 .  (11) 

Expressing  the  signal  autocorrelation  function  Rs(r)  in 
terms  of  its  power  spectral  density  P,(f),  i.e., 

B.(r)  -  J*“  Ps(f)  exp{j2rfr )dj,  (12) 

Eq.  (11)  becomes 

£|/(r)|-B  +  £|Bsl  +  /jj:.  MM  exp(-;2T/A2) 

-Ai(A|)  exp(j2ir/AI)Ps(/)  txp(j2-wfT)d'Kid'k?dt 

-  B  +  £|BS|  +  J"  H\(f)H2(f)Ps(f)  txp(j2rfT)df,  (13) 

where  H2(f)  and  H\(j)  are  the  transfer  functions  of  the 
point  modulator  (or  laser  diode)  and  the  AO  cell,  re¬ 
spectively.  When  formulated  as  in  Eq.  (13),  the  effect 
of  all  the  frequency  plane  errors  can  be  described  by  one 
transfer  function  H(f)  m  H\(J)H2(f)-  This  class  of  er¬ 
rors  is  global  in  nature  and  cannot  be  corrected  by 
postdetection  electronic  processing  and  are  thus  ap¬ 
preciably  different  from  the  input  plane  weighting  error 
sources  whose  effects  were  described  by  Eq.  (7). 

The  third  class  of  error  are  those  which  are  best 
classified  as  detector  plane  errors.  (They  are  due  to  the 
output  detectors  used.)  Some  of  these  detector  errors 
can  be  considered  as  spatial  response  variations.  They 
are  best  modeled  by  including  them  in  our  input  plane 
weighting  factor  u>(t).  The  next  detector  plane  error 

3132  APPLED  OPTICS  /  Vol.  23.  No.  16  /  15  September  1964 


we  consider  is  the  output  plane  sampling  (i.e.,  the  fact 
that  the  output  plane  detectors  are  of  finite  size  or 
length  D  in  one  dimension).  This  effect  causes  a  spatial 
integration  of  the  output  over  D  followed  by  a  temporal 
integration  over  Tj.  We  describe  such  detector  effects 
by  writing  the  observed  output  as  an  integral  over  space 
(dr)  followed  by  an  integral  in  time  (dt),  i.e., 


XT  in 

I  Wk(  T) 

Ttn  Jo-  i/2)D 


X  [B  +  Bs  +  sa(t)Sb(t  -  r)  +  si,U )]drdt, 


(14) 


where  k  is  the  detector  element  number,  u*(r)  is  the 
spatial  weighting  function  across  the  detector  k,  s*U ) 
is  the  noise  of  the  fcth  detector  element  (this  includes 
detector  element  cross  talk)  and  where  the  integration 
over  the  1-D  detector  area  D  describes  the  effect  of  the 
finite  size  of  each  detector  element. 


IV.  Classification  of  Error  Sources 

In  this  section,  we  consider  the  origin  of  the  specific 
system  error  sources  that  give  rise  to  the  three  types  of 
error  we  isolated  in  Section.  III.  We  also  discuss  several 
other  component  errors  present  in  an  AO  TI  correlator 
and  how  to  treat  their  effects. 

In  the  case  of  input  plane  errors,  we  include  the  input 
optical  beam  profile,  spatial  variations  in  the  AO  cell 
response,  and  the  nonuniform  element-to-element  re¬ 
sponse  of  the  detector  array.  The  input  optical  beam 
has  a  Gaussian  rather  than  a  plane-wave  profile  that  can 
be  described  as11 

Vt'oB(r)  =  exp  J-2  .  (15) 

where  the  beam-taper  coefficient  W  is  the  beamwidth 
at  which  the  input  light  intensity  is  down  by  exp(-2) 
and  tc  denotes  the  center  of  the  AO  cell.  This  effect 
can  be  reduced  by  proper  design  of  the  collimation  lens 
system  L\.  In  most  practical  situations  (AO  cells  with 
2-3-cm  aperture),  beam  uniformities  of  5%  can  be 
achieved  without  significant  loss  of  input  light.  This 
corresponds  to  a  worst-case  weighting  of  0.22  dB  across 
the  output.  This  can  be  reduced  further  by  postde- 
tection  processing  (since  the  errors  are  spatially  fixed). 
Thus  we  ignore  this  effect  in  our  future  analysis. 

The  AO  devices  are  the  system  components  with  the 
most  significant  input-plane  errors.  These  errors  in¬ 
clude  (1)  beam  walkoff  (referring  to  the  fact  that  the 
acoustic  beam  does  not  travel  normal  to  the  transducer 
as  it  propagates  along  the  cell);  (2)  reflections  from  the 
sides  of  the  cell  (referring  to  the  fact  that  the  acoustic 
wave  diffracts  as  it  leaves  the  transducer  and  thus  can 
strike  the  sides  of  the  cell  and  suffer  multiple  reflections 
before  reaching  the  end  of  the  cell.  This  results  in  a 
nonuniform  acoustic  field  and  is  particularly  important 
when  long  AO  delay  lines  are  used);  (3)  near-field  effects 
(referring  to  the  Fresnel  pattern  resulting  from  the 
transducer  excitation);  (4)  acoustic  attenuation  (re¬ 
ferring  to  the  fact  that  the  acoustic  field  strength  de¬ 
creases  exponentially  within  the  cell  with  increasing 
distance  from  the  transducer  and  as  a  function  of  fre¬ 
quency). 


This  last  AO  error  source  contributes  to  both  the 
input  plane  and  frequency  plane  errors.  For  a  fixed 
signal  bandwidth,  we  calculate  the  resulting  weighting 
due  to  acoustic  attenuation  as  a  function  of  distance 
only  and  incorporate  it  into  our  w(t)  input  plane 
weighting  function  in  Eq.  (7).  To  describe  the  model 
the  frequency  dependence  of  the  acoustic  attenuation 
a  (i.e.,  a  <*  /-),  we  include  its  effect  in  H(f)  for  the  AO 
cell. 

Many  of  these  error  sources  can  be  reduced  or  cor¬ 
rected  for  to  various  degrees.  The  beam  walkoff  can  be 
minimized  by  proper  AO  cell  design  and  accurate  crystal 
cut.  The  sound  reflections  and  near-field  effects  can 
be  partially  corrected  by  either  careful  AO  cell  design 
or  optical  spatial  filtering.  The  effects  of  acoustic  at¬ 
tenuation  can  be  reduced  (at  one  frequency)  by  elec¬ 
tronic  postprocessing  and  by  a  fixed  optical  mask  whose 
transmittance  compensates  the  acoustic  attenuation’s 
weighting.  The  last  input  plane  error  is  the  element- 
to-element  nonuniform  response  in  the  detector  array. 
State-of-the-art  detector  arrays  have  a  uniformity  of 
90-95%.  This  corresponds  to  a  maximum  0.46-dB 
spatial  variation  in  the  output  plane  intensity  data. 
This  weighting  is  generally  negligible  but  is  correctable 
if  required  for  a  given  application. 

Next  we  enumerate  the  frequency  plane  errors.  In 
this  category,  we  include  the  nonideal  transfer  function 
H(f )  =  | //(/)  |  exp[/0(/)]  of  the  AO  cells,  where  the 
magnitude  |H(/)|  and  phase  0(f)  are  functions  of  the 
frequency  of  the  input  signal.  It  is  known12  that  | H(f)\ 
is  the  product  of  (1)  the  transducer’s  transfer  function, 

(2)  the  shape  of  the  acoustic  interaction  bandwidth,  and 

(3)  spatial  frequency  response  terms  due  to  dispersion 
and  the  finite  AO  cell  aperture.  The  magnitude  of  the 
transfer  function  is  also  affected  by  the  acoustic  at¬ 
tenuation  as  noted  earlier.  The  AO  cell's  phase  re¬ 
sponse  0(f)  is  composed  of  (1)  the  transducer’s  phase 
response  and  (2)  the  optical  phase  within  the  cell.  The 
transducer’s  phase  is  in  general  nonlinear12  with  a  shape 
that  depends  mainly  on  the  bonding  techniques  used. 
(A  thin  bond  yields  a  quite  nonlinear  phase  response, 
whereas  a  quarterwave  bond  yields  a  less  nonlinear 
phase  response  but  a  poorer  electrical-to-acoustical 
conversion  efficiency  for  the  transducer.)  The  optical 
phase  effects  are  due  to  ( 1 )  off-axis  acoustic  beams  that 
propagate  in  different  directions  due  to  beam  walkoff 
and  other  effects  (each  off-axis  beam  will  have  a  dif¬ 
ferent  phase)  find  (2)  the  finite  transducer  width.  (This 
results  in  acoustic  diffraction,  which  in  turn  results  in 
off-axis  beams.)  With  careful  AO  cell  design  and  ex¬ 
ternal  optical  filtering,  the  optical  phase  effects  can  be 
minimized,  and  the  transducer’s  phase  effects  will 
dominate.  We  will  assume  this  in  future  analyses. 

In  Ref.  12,  exact  expressions  for  \h(f)\  and  models  for 
0(f)  have  been  calculated.  The  j//(/)|  expressions  are 
quite  complex,  and  it  is  thus  difficult  to  incorporate 
them  in  any  statistical  analysis.  Thus  the  alternative 
we  adopted  was  to  measure  \H(f)\  (from  the  optical 
Fourier  transform  of  the  light  leaving  the  AO  cell  when 
it  is  driven  by  a  linear  FM  signal)  and  to  approximate 
it  by  a  mathematical  function.  This  alternative  ap¬ 


proach  has  yielded  simpler  expressions  for  \H(f)\  that 
can  be  incorporated  in  a  statistical  analysis.  This  class 
of  errors  is  not  correctable,  and  thus  its  effects  on  the 
system’s  performance  will  be  studied  in  our  future 
work. 

The  last  class  of  error  source  are  those  due  to  the 
output  correlation  plane  detectors.  These  include  ( 1 ) 
the  sampling  and  area  integration  due  to  the  detector 
finite  area  D,  (2)  the  spatial  weighting  function  due  to 
the  trapezoidal13  spatial  response  across  each  detector 
element,  (3)  the  location  of  the  output  correlation  peak 
within  one  detector  element,  (4)  detector  noise,  and  (5) 
cross  talk  between  detectors.  The  effect  of  detector 
noise  has  been  considered,9  and  the  effect  of  finite  de¬ 
tector  area  has  been  initially  addressed.14  The  re¬ 
maining  detector  error  sources  and  the  effects  of  all 
errors  on  our  performance  measures  merit  further  re¬ 
search. 


V.  Performance  Measures 


Let  us  now  discuss  the  performance  measures  which 
we  will  use  in  our  error-free  analyses  (Secs.  VI  and  VII) 
and  in  our  future  work.  A  correlator  has  two  main 
purposes:  (1)  detection  of  the  presence  of  a  signal  and 
(2)  estimation  of  its  location.  These  two  different  ap¬ 
plications  require  different  performance  measures. 

As  detection  performance  measures,  one  should  use 
probability  of  detection  Pr>,  probability  of  false  alarm 
Pfa  .  and  probability  of  error  Pe.  The  Pd  is  the  prob¬ 
ability  that  the  correlation  value  at  the  peak  C( 0)  will 
exceed  a  threshold  6  when  the  correlation  is  present.  It 
is  given  by15 


I  r-  l-\x  -  EinoHfl  , 
D  "  v'2*  varlC(O)]  e*P\  2var|C(0))  / 


U6l 


where  £[C(0)]  and  var|C(0)]  are  the  expected  value  and 
variance  at  the  correlation  peak  and  where  C(0)  is 
modeled  by  a  Gaussian  random  variable  from  central 
limit  theorem  arguments.  Pd  will  be  less  than  unity 
because  of  noise  and  because  of  the  statistical  nat  ure  of 
the  signals.  The  presence  of  noise  also  results  in  a 
nonzero  probability  that  the  value  of  the  correlation  at 
the  peak  will  exceed  the  threhsold  when  the  signal  is 
absent.  This  is  the  probability  of  false  alarm.  It  can 
be  described15  as 


1  f*  /-|x  -  £K'(r)]l;V, 

FA  V2*  var|C(r ))  e*P\  2var|C(T)|  J 
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where  the  mean  fjClr)]  and  the  variance  var|C(r)]  of 
the  noise  in  the  output  plane  can  be  estimated  by  eval 
uating  C(t)  at  r  »  0.  With  no  loss  of  generality,  we 
assume  the  correlation  peak  occurs  at  r  =  0.  Assuming 
equal  probabilities  for  the  presence  and  absence  of  a 
signal,  Pe  is  given  by 


P,  -  Vyi  +  Pfa  -  Pi,)-  <l*' 

These  three  probabilistic  performance  measures  can 
be  easily  expressed  as  a  function  of  the  correlation  plane 
SNR  values: 


SNR, 


E;|no)] 

var|('(0(] 
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SNR* 


£2|C(0)1 
v*r(C(r)|  o*o 


(20) 


which  can  easily  be  evaluated  and  experimentally 
measured.  SNRj  is  the  SNR  at  the  peak  (the  conven¬ 
tional  communications  definition15),  and  SNR2  is 
similar  to  the  peak-to-sidelobe  ratio  (with  the  sidelobe 
or  noise  level  measured  at  r  »  0,  far  from  the  peak). 
Both  of  these  SNR  measures  have  been  used  previous¬ 
ly,1617  and  SNR2  can  be  directly  related  to  SNRi.  In 
terms  of  these  SNRs,  one  can  show 


Pd 


Pfa 


-SNRijx  ~  £  1C(0)H: 
2£2(C(0)1 


v/2x£'?[r(0))/SNR,X  **P(' 

- ? _ -X*4SNR*u~£tC( 


v'2tr£2[C(0)|/SNR2 


2£2[C(0) 


(2 
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(22) 


We  now  consider  the  second  correlator  application: 
delay  estimation.  The  location  of  the  correlation  peak 
contains  useful  target  location  and  signal  synchroni¬ 
zation  information,  and  thus  one  refers  to  the  target 
location  or  signal  delay  estimation  performance  of  a  TI 
AO  correlator.  In  image  correlations,  this  performance 
measure  is  referred  to  as  registration  error,17  since  it 
describes  the  accuracy  with  which  the  location  of  an 
object  in  an  image  is  known  or  the  accuracy  to  which  two 
images  can  be  registered.  To  develop  an  expression  for 
the  delay  estimation  error  e,  we  define  the  exact  peak 
location  as  r0,  and  we  denote  the  estimated  peak  loca¬ 
tion  by  f 0.  Then 


e  -  (r0  -  fo)-  (23) 

We  will  denote  other  observed  parameters  by  a  (a)  (i.e., 
the  observed  correlation  function  is  C,  whereas  the  exact 
or  ideal  correlation  function  is  C).  To  analyze  e,  we 
must  calculate  its  expected  value  and  its  variance.  The 
first  parameter  determines  if  the  estimator  is  biased 
(time  bias),  whereas  the  second  provides  us  with  the 
variation  to  be  expected  in  calculating  e.  We  thus  use 
both  £[e)  and  var[c)  as  delay -estimation  performance 
measures. 

VI.  Error-free  Detection  Analysis 

In  this  section,  we  derive  general  expressions  for  the 
mean  and  variance  of  the  correlation  output.  We  then 
evaluate  these  expressions  for  a  zero-mean  Gaussian 
signal  model  with  a  Gaussian-shaped  autocorrelation 
function.  We  then  evaluate  SNRi  and  SNR2  and  plot 
Pd  and  Pfa  as  functions  of  the  basic  TI  AO  system 
parameters. 

For  both  active  and  passive  signal  processing,  the 
transmitted  (£ )  and  received  s*U)  signals  to  be  cor¬ 
related  are  denoted  by 

».<f)  -  »«).  (24) 

**(»)■*(»- r0) +  «(»>,  (25) 

where  r0  is  the  time  delay  between  the  two  signals,  and 
n(t )  is  additive  noise.  Assuming  equal  biases  (Si  *  B2 
=  B)  for  simplicity,  Eq.  (4)  becomes 


„  B  rTin  B  rT'n 

C(r)  ■  fi5  +  —  I  lit  —  ro)dt  +  —  |  n(t)dc 
Tl  J-Tifi  T,  J-T1/2 

B  r  Tin  I  r  Tin 

+  —  I  i(t-r)dt  + —  I  *(/  -  rlndWl 
Tj  J-T,n  Ti  J-T1/2 

1  r  Tin 

+  —  I  *(»  -  r)«(t  -  rdfdt.  (26) 

Ti  J-Ti/i 

Since  s(t)  and  n(t)  are  zero-mean  signals  and  as¬ 
suming  that  they  are  independent  random  processes, 
the  expected  value  of  the  correlation  peak  (assumed  to 
occur  at  ro  s  0  with  no  loss  of  generality)  becomes 

£[£«»]  -  —  J  T/E\Mt(t)]dt  -  R.<0),  (27) 

where  R$(r )  is  the  signal  autocorrelation  function,  and 
where  we  assumed  that  the  bias  term  B2  was  subtracted 
from  the  output.  In  general,  we  find  £{C(r)]  =  R,(r), 
and  thus  (with  bias  subtraction)  the  estimator  is  unbi¬ 
ased  or  the  average  value  of  the  estimated  correlation 
equals  the  correlation  we  are  trying  to  estimate.  The 
variance  of  C(r)  is  easily  found  to  be 


v4t(£(t)|  -  E[C(t)\*  -  £21C(t» 

-  (BVT?)  CT‘  ( Ti  -  |x|)l2«.U)  +  R„(r) 

J-T, 

+  (l/B2)R2U)  +  (1  /BZ)R,(*  +  r )R.(z  -  T> 

+  (1/B2)R,  (*)£„(*)  +  2 !£,<*-  t)1*,  (28) 

where  the  assumption  that  s(£)  is  Gaussian  distributed 
(this  makes  the  third-order  moments  zero),  and  the 
fourth-order  theorem  for  Gaussian  random  variables18 
were  used. 

We  now  consider  the  evaluation  of  our  output  SNRo 
measures  in  Eqs.  (19)  and  (20)  for  the  specific  case  of  a 
Gaussian  signal  with  a  Guassian  -shaped  autocorrelation 
function.  In  this  case,  R,  iz)  has  the  form 

B.(r)  -  Bo  npi-nfPr2),  (29) 

where  fi  is  the  signal’s  3-dB  bandwidth  (BW)  and  Ro  » 
the  signal  power.  We  assume  that  the  noise  has  power 
Rn  and  has  a  statistical  autocorrelation  function  that 
is  of  the  same  form  as  in  Eq.  (29).  In  this  case,  the  input 
SNR  is  SNR/  *  Ro/Rn ■  A  more  complex  SNR/  ex¬ 
pression  results16  if  this  assumption  is  not  valid. 
Substituting  Eq.  (29)  into  Eqs.  (27)  and  (28),  evaluating 
at  t  *  0,  and  assuming  T/  »  1/0  (i.e.,  a  signal  time- 
bandwidth  product  TBW  »  7/  >  10),  we  find 


~  (2  +  1/SNR/)  +  — — -  (4  +  — i— ’ ) 

2  (SBR)2  \  SNR// 

where  SBR  ■  \rRo/B  is  the  signal-to-bias  ratio.  The 
SNR2  expression  is  identical  to  Eq.  (30)  with  (2  +  1/ 
SNR/)  replaced  by  (1  +  1/SNR/).  We  note  that  when 
SBR  and  SNR/  are  infinite,  SNR2  is  >SNRi  by  3  dB. 

From  Eq.  (30),  we  note  that  both  SNRo  measures 
increase  as  either  T/0,  TBW,  SNR/,  or  SBR  increase. 
These  results  are  as  expected.  In  Fig.  2,  we  plot  SNR2 
vs  TBW  for  several  SBR  values.  These  plots  are  im¬ 
portant  since  they  quantify  the  rather  severe  SNRo  loss 
for  SBR  <  <*>.  SNRo  i*  superior  with  AO  cell  amplitude 
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Fig.  2.  Effect  of  SBR  and  signal  TBW  on  output  SNRj  for  a  tine- 
integrating  correlator. 
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modulation  than  with  intensity  modulation.  Here  we 
quantify  this  effect  and  the  effect  of  SBR  on  SNRo. 
The  case  of  no  bias  (i.e.,  SBR  ■  ®)  corresponds  to  AO 
cell  amplitude  modulation.  SBR  ”  0.5  corresponds  to 
the  best  case  one  can  obtain  using  AO  cell  intensity 
modulation.  Comparing  these  two  cases,  we  find  an 
SNRo  loss  of  8  dB  for  most  TBWs. 

Expressions  for  Pd  and  Pfa  are  obtained  by  substi¬ 
tuting  Eq.  (30)  into  Eqs.  (21)  and  (22).  We  normalize 
to  unity  and  consider  E[C(t)]  to  be  zero  for  r  »  0.  In 
Fig.  3,  we  show  Pd  (for  Pfa  m  0.001)  and  Pfa  (for  PD  * 
0.999)  vs  TBW  for  three  SNR/  values  (i.e.,  0.1, 1.0,  and 
10.0).  From  these  curves,  we  see  that  Pd  (Pfa  )  in¬ 
creases  (decreases)  monotonically  as  TBW  or  SNR/ 
increases,  thus  improving  the  detection  performance 
of  the  system.  Such  trends  ar  well  known.  In  Fig.  4, 
we  show  Pd  (for  Pfa  =  0.001)  and  Pfa  (for  Pd  m  0.999) 
as  a  function  of  TBW  and  SBR.  As  expected,  Pd(Pfa  ) 
increases  (decreases)  as  SBR  increases.  For  the  best 
practical  intensity  modulation  case  (SBR  *  0.5),  both 
Pd  (Pfa  )  are  quite  inferior  to  the  Pd  and  Pfa  values  for 
SBR  *  <*>.  For  TBW  >  5000,  the  difference  is  small. 
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Fig.  3.  Effect  of  input  SNR/  and  tignal  TBW  on  detection  perfor-  Fig,  4.  Effect  of  SBR  and  aignal  TBW  on  the  detection  performance 

“•nee  for  a  time-integrating  correlator,  (a)  Pd  (for  Pfa  ”  0.001)  and  of  a  time-integrating  correlator  (a)  Pd  (for  PfA  •  0.001)  and  (b)  Pfa 

(b)  Pfa  (for  PD  -  0.999).  (for  PD  -  0.999). 
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From  Eq.  (33),  we  see  that  e  is  an  unbiased  estimate. 
This  follows  since  Cs  is  a  maximum  at  r„,  and  thus  its 
gradient  is  zero  at  t0.  The  second  term  in  the  numer¬ 
ator  of  Eq.  (33)  is  zero  since  it  corresponds  to  the  zero 
mean  terms  in  Eq.  (26).  Next,  we  evaluate  the  variance 
of  e,  which  (from  the  above)  simplifies  to 


var|c)  «  |£|f,"(ro>l)'2|£'!(\<7-(i)|2  +  flOni!!2)  (34) 

To  evaluate  Eq.  (34)  in  terms  of  system  parameters,  we 
considered  the  case  of  a  Gaussian-distributed  signal  and 
noise  with  a  Gaussian-shaped  autocorrelation  function 
given  by  Eq.  (29).  For  this  case,  a  lengthy  but 
straightforward  analysis  shows 
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Fig.  5.  Effect  of  integration  time  TV  and  signal  BW  on  the  variance 
var[c]  of  the  delay  estimation  error  e  for  a  time-integrating 
correlator. 

For  TBW  <  1000,  the  difference  is  quite  appreciable 
(i.e.,  for  TBW  =  450,  Pp  =  0.35  vs  0.999,  and  PFA  =  0.65 
vs  0.00001).  From  these  data,  the  TBW  increase  re¬ 
quired  to  achieve  a  given  performance  for  a  given  SBR 
can  be  found. 

VII.  Error-free  Delay  Estimation  Analysis 

We  now  analyze  the  delay  estimation  performance 
of  the  correlator  with  attention  to  the  calculations  of 
£[e]  and  var[e],  their  relationships  to  measurable  C(t) 
correlation  data,  and  their  dependence  on  system  pa¬ 
rameters. 

To  relate  C(t)  to  e,  we  expand  C(r)  in  a  Taylor  series 
around  t  =  t0.  We  consider  f0  ==  to  and  thus  ignore 
terms  higher  than  (f0  -  r0)2  in  our  C(r0)  expansion. 
Setting  the  derivative  C'(t)  of  C(t)  equal  to  zero,  we 
find 

e  “  (to  -  f0)  “  -C'(ro)/r'(ro).  (31) 

where '  and  "  denote  first  and  second  derivatives.  F rom 
the  gradient  C'(t0)  and  curvature  C'(to)  of  the  corre¬ 
lation  at  the  peak,  we  can  thus  obtain  e.  We  consider 
the  case  when  the  standard  deviation  of  the  curvature 
C"(t0 )  of  the  correlation  at  the  true  peak  is  small.  In 
this  case,  C*(r0)  is  well  approximated  by  its  average 
value.  We  thus  write 

e  a  -C'(ro)/0"(T0),  (32) 

where  the  average  value  £"(r0)  is  used,  since  it  can  be 
more  easily  evaluated18  for  our  statistical  signals. 

To  describe  the  delay  estimation  performance,  we 
require  the  mean  and  variance  of  e  in  Eq.  (32)  in  terms 
of  measurable  correlator  parameters.  To  develop  this, 
we  denote  the  two  parts  of  the  correlation  output  in  Eq. 
(26)  (for  an  intensity  modulation  scheme)  by  C„(t)  |the 
autocorrelation  ofs(t)  or  the  last  term  in  Eq.  (26)]  and 
C„(t)  (the  terms  remaining  in  Eq.  (27)  after  bias  sub¬ 
traction,  i.e.,  only  the  fifth  term  in  Eq.  (26)].  In  terms 
of  these  quantities. 


where  equal  signal  and  noise  bandwidths  0  and  TBW 
^  10  were  assumed. 

Equation  (35)  relates  the  delay  estimation  accuracy 
of  a  TI  AO  correlator  to  the  various  system  parameters. 
To  quantify  the  delay  estimation  performance  and  the 
effect  of  the  various  system  parameters,  we  include  Fig. 
5.  In  Fig.  5,  we  show  how  var[e]  varies  with  the  band¬ 
width  0  for  four  different  T/  values  (0.1, 0.5, 1.0,  and  5.0 
msec).  From  this  figure,  we  see  that  var[e]  monotoni- 
cally  decreases  as  0  or  T/  increases,  and,  most  impor¬ 
tant,  this  plot  quantifies  these  variations.  The  main 
new  feature  in  these  data  is  that  var]c]  depends  more 
on  the  signal  bandwidth  (Avar(e]  1/0*)  than  on  the 
integration  time  (Avar]?]  1  IT]).  In  retrospect,  this 
might  have  been  expected  because  of  the  well-known 
inverse  dependence  of  the  width  of  the  correlation  peak 
on  the  bandwidth  of  the  signal.  However,  the  depen¬ 
dence  has  now  been  quantified.  In  Fig.  6,  we  show 
var[e]  vs  bandwidth  0  for  three  SNR/  values  (0.1, 1.0, 
and  10.0).  As  expected,  var[e]  decreases  as  SNR/  or  0 
increases.  Finally,  in  Fig.  7,  we  plot  var[e]  as  a  function 
of  bandwidth  0  for  two  SBR  values  (0. 1  and  °° ).  As  we 
see,  var[e]  exhibits  a  negligible  dependence  on  SBR. 


Fig.  6.  Effect  of  SNR/  and  signal  BW  on  the  variance  var|r]  of  ihe 
delay  estimation  error  e  for  a  time-integrating  correlator. 
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Fig.  7.  Effect  of  SBR  and  signal  BW  on  the  variance  var|e|  of  the 
delay  estimation  error  e  for  a  time-integrating  correlator. 


This  is  a  rather  important  result  since  it  demonstrates 
that  AO  correlators  with  amplitude  (SBR  =  °°)  or  in¬ 
tensity  (SBR  ^  ®)  modulation  will  yield  equivalent 
delay  estimation  performance. 

VIII.  Summary 

In  this  paper,  we  have  modeled  and  classified  the 
error  sources  present  in  a  time-integrating  acoustooptic 
correlator.  We  have  shown  that  the  error  sources  can 
best  be  grouped  into  three  major  error  classes:  input 
plane  errors;  frequency  plane  errors;  and  output  de¬ 
tector  plane  errors.  The  effect  of  input  plane  errors  was 
shown  to  be  a  weighting  of  the  correlation  output. 
These  errors  were  quantified,  and  we  noted  that  such 
errors  are  in  general  negligible  and  correctable.  The 
frequency  plane  errors  are  uncorrectable  and  were  noted 
to  result  in  a  complex  frequency  domain  weighting. 
The  major  such  error  was  found  to  be  the  nonideal 
transfer  function  of  the  acoustooptic  cells.  Most  output 
detector  plane  errors  are  uncorrectable  and  were  noted 
to  consist  of  detector  limitations  such  as  sampling,  area 
integration,  and  weightings. 

We  also  suggested  performance  measures  to  describe 
the  performance  of  a  time-integrating  acoustooptic 
correlator  in  different  applications.  For  detection  ap¬ 
plications,  we  employ  Pp,  Pfa,  and  Pe  and  noted  how 
to  relate  these  measures  to  acoustooptic  time-inte¬ 
grating  correlator  parameters  and  measurable  correla¬ 
tion  output  SNRs.  For  delay  estimation  applications, 
we  used  the  expected  value  and  variance  of  the  regis¬ 
tration  error.  For  these  two  correlator  applications,  we 
demonstrated  how  simple  statistical  analyses  allow 
quantification  of  the  effects  of  different  system  pa¬ 
rameters.  Using  these  formulations,  we  conducted 
error-free  statistical  analyses  for  both  detection  and 
delay  estimation.  Our  analyses  confirmed  (and  most 
important,  quantified)  well-known  trends.  They  also 


provided  and  quantified  several  new  results.  These 
included  the  effect  of  signal-to-bias  ratio  on  both  de¬ 
tection  and  delay  estimation  performance.  For  de 
tection  applications,  we  found  that  low  SBR  values 
significantly  affect  Pp  and  Pfa  For  delay  estimation 
applications,  we  found  SBR  effects  to  be  rather  negli¬ 
gible.  These  trends  affect  the  selection  of  amplitude 
or  intensity  modulation  modes  for  acoustooptic  cells. 

Our  error-free  statistical  analyses  provide  valuable 
quantitative  data.  They  also  provide  the  base  line 
performance  levels  against  which  to  quantify  the  effect 
of  the  various  component  error  sources  noted.  This 
present  paper  has  laid  the  framework  for  the  component 
error  source  analysis  of  acoustooptic  correlators.  A 
detailed  error  source  analysis  using  these  guidelines, 
models,  and  performance  measures  is  the  subject  of 
future  research. 
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ABSTRACT 

Parallel  optical  pattern  recognition  architectures  for  multi-class  distortion-invariant 
autonomous  target  recognition  (ATR)  are  described.  Algorithms  that  utilize  the  parallel 
outputs  and  real-time  processing  features  of  optical  systems  are  noted.  Three  hybrid 
optical/digital  feature  extraction  techniques  for  ATR  are  described  together  with  an 
optical  correlation  method  that  achieves  multi-class  shift-invariant  distortion-invariant 
object  identification.  Initial  results  on  selected  military  objects  are  included  in  the 
presentation.  Brief  remarks  on  optical  systolic  linear  algebra  processors  are  also 
advanced  as  they  apply  to  the  processing  requirements  for  ATR. 

1.  INTRODUCTION 

The  real-time,  parallel-processing,  low  size,  weight  and  power  dissipation  advantages  of 
optical  pattern  recognition  (OPR)  systems  for  ATR  have  long  been  recognized. 
Recently,  several  small  size  and  weight  real-time  optical  correlators  have  been  fabricated 
and  demonstrated.1,2  Thus,  the  technology  of  OPR  for  ATR  merits  attention  and 
discussion.  In  Section  2,  we  briefly  review  the  classic  Fourier  transform  (FT)  and 
correlation  operations  of  such  architectures.  Section  3  consider  three  different  hybrid 
optical/digital  feature  extractors  and  Section  4  considers  a  new  optical  correlator.  In  all 
cases,  these  parallel  architectures  and  algorithms  achieve  distortion-invariant  multi-class 
recognition.  Recent  performance  of  one  feature  extraction  system  on  non-controlled  IR 
data  and  recent  performance  of  the  correlator  Section  4  in  structured  clutter  have  been 
obtained.  Optical  systolic  linear  algebra  processors  are  then  briefly  noted  in  Section  5. 
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2.  A  REVIEW  OF  OPR. 

The  system  of  Figure  1  is  the  classic  OPR  architecture.  The  FT  0(u,v)  of  the  input 
image  g(x,y)  in  Pj  appears  at  P2  with  higher  input  spatial  frequencies  (u,v)  appearing  at  I 

radially  increasing  distances  from  the  center  of  P^.  As  the  input  translates,  the  intensity 
detected  magnitude  of  the  FT  is  shift-invariant.  However,  as  the  input  object  rotates,  so 
does  the  FT.  These  features  of  a  coherent  optical  system  are  exploited  in  all  of  our 
architectures  to  be  described.  In  the  full  system  of  Figure  1,  a  transparency  proportional  * 

to  the  conjugate  FT  H  (u,v)  of  a  reference  object  can  be  recorded  holographically  at  P2. 

This  is  referred  to  as  a  matched  spatial  filter  (MSF).  The  light  distribution  incident  on 

P2  is  G(u,v)  and  the  light  leaving  P2  is  G(u,v)H  (u,v).  Thus  passing  one  2-D  image  I 

plane  through  another  achieves  a  2-D  point-by-point  multiplication.  This  feature  is 

likewise  constantly  exploited  in  OPR  systems.  The  FT  of  this  product  of  FTs  is  then 

formed  at  P3,  where  the  correlation  of  the  two  space  functions  f  and  h  results. 
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FIGURE  1  Coherent  optical  Fourier  transform  and  correlation 
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3.  OPTICAL  FEATURE  EXTRACTION 

The  classic  approach  to  pattern  recognition  employs  a  training  set  of  imagery  from 
which  features  are  extracted  and  subsequently  operated  upon  to  determine  class  and 
orientation  estimates  of  input  objects  and  the  confidence  of  these  estimates.  A  hybrid 
optical/digital  architecture  in  which  the  image  features  are  optically  computed  in 
parallel  is  shown  schematically  in  Figure  2. 
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FIGURE  2  Simplified  diagram  of  a  hybrid  optical/digital 
feature-space  pattern  recognition  processor 


Such  an  architecture  is  attractive  because  it  can  provide  orientation  information  on  the 
input  object  and  because  the  same  optical  system  can  be  used  for  different  object  classes. 
With  the  proper  digital  post-processor,  distortion-invariance  and  multi-class  recognition 
can  be  achieved.  We  now  discuss  three  versions  of  parallel  optically-computed  features 
and  the  associated  digital  post-processor  system  required. 


3.1.  Fourier-Coefficient  Feature-Space 

The  shift-invariance  of  the  FT  coupled  with  the  change  in  scale  and  rotation  of  the  FT 
pattern  with  changes  in  the  scale  and  orientation  of  the  input  image  can  be  utilized  for 
feature-space  pattern  recognition.  The  anatomy  of  an  optical  FT  dictates  that  a  wedge 
ring  detector  (WRD)  sampling  at  P2  of  Figure  1  provides  data  compression  and 
dimensionality  reduction  in  a  Fourier  coefficient  feature-space  plus  scale  (from  wedge 
data)  and  rotation  (from  ring  data)  invariance.  Many  uses  of  this  technique  have  been 
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FIGURE  3  Block  diagram  of  a  hybrid  optical/digital  WRD-sampled 
Fourier-coefficient  feature-space  pattern  recognition  system 

detailed.  The  most  recent  work  used  the  system  block  diagram  of  Figure  3  in  which 
the  amplitude,  phase  and  both  the  amplitude  and  phase  of  the  FT  were  used  as  the 
observation  space.  WRD-sampling  provided  dimensionality  reduction  to  64  image 
features  (32  wedges  and  32  ring  data  elements).  Feature  extraction  involves  three 
techniques: 

1.  projection  of  the  feature  vector  onto  the  dominant  Karhunen-Loeve  (KL)5 
eigenvectors  per  object  class; 

2.  projection  onto  a  Fukunaga-Koontz  (FK)6  discriminant  vector  for  each  class, 
with  FK  feature  vectors  calculated  only  from  the  dominant  KL  eigenvector; 
and 

3.  projection  onto  the  Foley-Sammon  (FS)7  discriminant  vector,  calculated  from 
the  dominant  KL  eigenvectors  only. 

The  three  different  feature  extractors  noted  in  Figure  3  were  evaluated  using  two 
different  data  sets  (50  different  vehicles  in  two  classes  and  50  different  letters  of  two 
types,  each  with  different  scales  and  orientations).  The  KL  feature  extractor  was  found 
to  give  good  intra-class  performance,  but  better  Fisher  ratio  performance  measures 


resulted  when  the  FK  and  FS  unitary  transformations  were  employed.  Amplitude 
Fourier-coefficient  features  were  found  to  be  more  robust  in  the  presence  of  noise  than 
were  phase  features.  This  is  attributed  to  the  concentration  of  the  dominant  Fourier- 
amplitude  coefficients  into  a  few  WRD-samplcs,  whereas  phase  Fourier-plane  data  is 
more  evenly  distributed  over  all  W71D  samples.  An  extensive  tabulation  and  analysis  of 
this  data  is  available  elsewhere.8.  The  performance  obtained  is  not  the  major  concern  at 
present,  rather  the  flexibility  of  digital  analysis  of  Fourier-coefficients  that  are  optically 
produced  in  parallel  is  the  major  message  to  be  conveyed.  These  features  are  easily 
produced  on  the  simplest  coherent  optical  processor  in  parallel.  Dimensionality 
reduction  of  these  features  is  employed  to  simplify  the  digital  post-processing  required. 
Only  simple  vector  inner  product  operations  are  needed  in  the  post-processor,  with 
computation  of  the  discriminant  functions  and  transformation  matrices  required  being 
performed  off-line  on  training  set  data. 


3.2.  Chord-Histogram  Distribution  Feature-Space 
The  chords  of  an  object  boundary  define  the  object’s  shape  and  are  useful  image 
features.0,10  Each  chord  is  described  by  two  parameters  (its  length  r  and  angle  0).  The 
distribution  li(r,0)  of  all  chords  thus  defines  the  shape  of  the  boundary.  Denoting  a 
boundary  point  on  an  object  by  b(x,y)  =  1,  then 


g(x,y,r,0)  =  b(x,y)b(x+rcos0,y-|-ra»n0)  =  1 


(1) 


defines  a  chord.  The  chord  distribution  is  simply  the  integral 


g(x,y,r,0)d>:dy. 


(2) 


Substituting  (£,»j)  =  {tco80,TsinO)  into  (2),  the  chord  distribution  is  seen  to  be  the 
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autocorrelation  of  the  object's  boundary.  Optical  systems  easily  perform  the 
autocorrelation  function  on  Figure  1  or  in  a  joint  transform  correlator  or  from  the  FT  of 
the  magnitude  of  a  FT.  Since  optical  systems  perform  a  correlation  on  the  full  grey-scale 
image  rather  than  on  just  the  object's  boundary,  a  generalized  chord  distribution 
function  can  be  obtained  optically.11  We  WRD-sample  thb  optical  autocorrelation  plane 
to  simultaneously  obtain  the  h(r)  and  h(0)  chord  distributions  and  a  reduced 
dimensionality  feature  space.  These  distributions  provide  invariance  to  object  rotations 
and  scales  respectively. 

The  hybrid  optical/digital  system  block  diagram  shown  in  Figure  4  uses  the  chord 
distributions  optically  generated  in  parallel  together  with  a  vector  inner  product  of  the 
observed  feature  vector  and  a  Fisher  discriminant  vector  w  for  feature  extraction.11 
Comparison  of  the  vector  projection  value  to  a  threshold  determined  from  the  training 
set  data  determines  the  class  of  the  input  object.  As  before,  the  post -processor  must 
perform  only  a  vector  inner  product  since  calculation  of  the  Fisher  discriminant  vector  is 
performed  off-line  on  training  set  data.  The  major  multi-class  databases  on  which  most 
of  the  results  noted  were  obtained  consists  of  five  different  ship  classes  from  a  00° 
depression  angle  with  36  images  per  class  (at  10°  aspect  intervals).  Extensive  data11 
(summarized  in  Table  1)  was  obtained  with  the  system  of  Figure  4. 
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FIGURE  4  Block  diagram  of  a  hybrid  optical/digital  generalized 
chord  histogram  feature-space  pattern  recognition  system 


To  compute  the  Fisher  discriminant  vector  w  with  a  reduced  number  of  training  set 


» 


Tabic  -  1:  Test  results  obtained  with  a  generalized  chord  feature  space  on  the 

72  images  in  the  first  2  ship  image  classes 


TEST 

NUMBER 

NUMBER  OF 

TR  SET  IMAGES 
USED  PER  CLASS 

TR  SET 
SELECTION 
REMARKS 

NUMBER 

OF 

ERRORS 

PERCENT 

CORRECT 

CLASSIFICATION 

I  fi 

1 

IB 

IMAGE 

EVERY 

20° 

0 

100% 

2 

12 

IMAGE 
EVERY  20° 
(150° 

(BROADSIDE) 

8 

88.9% 

3 

12 

IMAGE 

EVERY 

30® 

0 

100% 

images,  we  selected  the  18  dominant  WRD  features  and  used  12-18  different  training  set 
images  per  class  (tests  1-3  in  Table  1).  The  results  shown  indicate  that  perfect  class 
performance  of  all  72  images  in  the  two  ship  classes  tested  can  be  obtained  with  as  few 
as  12  training  set  images  per  class.  Such  excellent  and  correct  recognition  and 
classification  of  multiple  object  classes  in  the  face  of  3-D  out-of-plane  aspect  distortions 
are  typical  of  the  performance  that  is  possible  with  parallel  optical  feature  extractors. 


3.3.  Moment  Feature-Space 

The  geometrical  intensity  moments  of  an  object  f(x,y)  are  defined  by 


m 


pq 


f(x,y)xpyqdxdy. 


(3) 


These  features  are  used  in  nearly  all  computer  vision  systems.13  The  moment  feature 
vector  m  can  be  computed  optically  in  parallel  on  the  system  of  Figure  5. 


FIGURE  5  Schematic  diagram  of  an  optical  moment-based  feature 

generation  system 


With  different  monomial  masks  g(x,y)  =  xpyq  present  on  different  spatial  frequency 
carriers  at  P2,  the  P3  output  pattern 


/  f(x,y)g(x,y)dxdy 


(4) 


corresponds  to  the  moments  of  the  Pj  input  f(x,y),  each  located  at  a  spatially-different 
position  in  P3.  The.  parallel  moment  computer  of  Figure  5  is  attractive  because  the 
computed  moments  can  be  corrected  for  various  optical  system  errors  in  a  simple  matrix- 
vector  post-processor.12  The  architecture  of  Figure  4  can  be  fabricated  in  a  small  size 
system  occupying  330in3  or  much  less  volume  if  needed. 

The  parallel  set  of  observed  moment  features  m  optically-computed  in  parallel  are  fed 
to  the  two-class  classifier  of  Figure  6. 


CLASS 


FIGURE  6  Full  hybrid  optical/digital  moment  feature-space 
two-level  classifier  pattern  recognition  system 

In  the  first-level  classifier,14,15  the  central  moment  ratio  #*2o/f,o'’  13  use<*  to  estimate  the 
aspect  ratio  of  the  input  object  and  a  hierarchical  node  tree  is  used  to  provide  class 
estimates.  The  node  selection  is  automated  from  scatter  plots  onto  a  multi-dimensional 
Fisher  space  obtained  from  the  / 1  for  the  training  set  data.  The  branch  selection  is 
automatically  determined  from  similar  two-class  Fisher  projections.  This  first-level 
classifier  reduces  the  number  of  aspect  view  classes  that  the  second-level  classifier  must 
handle.  It  also  allows  the  jointly  Gaussian  random  variable  nature  of  m  features  with 
respect  to  sampling  to  be  employed  in  a  Bayesian  classifier.  The  discriminant  function 
calculated  in  the  second-level  classifier  for  each  aspect  view  class  i  is 

g((m)  =  [m  -  mi(yjTri*1(m  -  m.(b)J,  (5) 


where  is  the  covariance  matrix  for  class  i,  mis  the  observed  moment  vector  and  nv(b) 
denotes  the  reference  moment  vector  for  class  i  with  distortions  described  by  the 
distortion  vector  b  =  (xo,yo,a,b,R,0),  where  (xQ,yQ)  are  translations  of  the  input  object, 
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(a,b)  arc  its  horizontal  and  vertical  scale  changes,  R  is  its  range  and  0  is  its  in-plane 
rotation  angle.  For  each  aspect  view  class,  (5)  is  evaluated  for  an  initial  b°  estimate 
obtained  from  the  m  ,  and  a  new  bk  estimate  at  iteration  k  is  calculated  from  the 


nonlinear  estimator 


bk+1  =  bk  +  ((Jk)Tr.*1  JkjTJTr.*1lm  -  m.(bk)]. 


Eq.(5)  is  then  evaluated  for  this  new  b  estimate  and  the  process  is  continued.  The 
a>pect  class  i  and  the  distortions  b  that  yield  the  lowest  g^m)  define  the  object  class, 
aspect  and  distortion  parameter  estimates.  The  gj(m)  value  is  a  measure  of  the 
confidence  of  our  estimate. 

Excellent  performance  (over  00%  correct  class  recognition  has  been  obtained  with  this 
parallel  algorithm  and  architecture  for  our  180  image  ship  database15  and  for  a  32  image 
five-class  pipe-part  robotic  database14  and  on  non-controlled  real  infrared  imagery.20  In 
each  case,  3-D  aspect  distortions  of  all  aspects  over  all  SCoPwere  used.  The  algorithm  in 
(5)  and  (6)  requires  6500  operations  per  iteration  and  in  general  requires  only  six 
iterations.  Thus,  the  full  architecture  of  Figure  6  is  quite  parallel,  efficient,  automated, 
has  a  sound  theoretical  basis  and  has  demonstrated  excellent  initial  performance  results. 


4.  DISTORTION-INVARIANT  OPTICAL  CORRELATORS 

The  optical  correlator  of  Figure  1  has  the  multi-object,  processing  gain  and 
performance  and  noise  features  noted  earlier.  To  form  a  MSF  at  P2  of  Figure  1  that 
yields  a  distortion-invariant  correlation,  we  employ  a  training  set  of  images  {fn}  of 
different  distorted  versions  of  the  object  f  in  one  class.  We  form  the  correlation  matrix 
Rj  for  this  data  set  and  restrict  the  filter  function  h(x,y)  to  be  a  linear  combination  of 
the  training  set  data 


h(x,y)  =  T  aB  f(x,y). 


I 
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The  MSFs  with  distortion-invariant  features  that  we  dbcus3  heroin  are  referred  to  as 
synthetic  discriminant  functions  (SDFs).  Five  different  types  of  SDFs  have  been 
defined.16  We  now  emphasize  the  algorithms  to  produce  these  SDFs  and  briefly  discuss 
their  performance.  The  SDF  synthesis  algorithm  is  computed  off-line  on  training  set 
data.  Once  synthesized,  the  SDF  can  be  used  on-line  in  the  parallel  correlator  of  Figure 
1  with  no  additional  computational  overhead. 

To  produce  a  filter  function  h  such  that  its  correlation  with  all  {fB}  b  a  constant  value 
of  unity,  i.e. 


(8) 


the  filter  in  (7)  b  defined  by 


(9) 


where  u  b  the  unit  vector.  This  b  referred  to  as  an  equal  correlation  peak  (ECP)  SDF. 
It  b  useful  for  intra-class  pattern  recognition.  To  achieve  inter-class  dberimination  with 
one  object  per  class,  we  desire  N  filters  h  for  an  N-class  problem  such  that 


(10) 


These  mutual  orthogonal  function  (MOF)  SDFs  are  defined  by 


•i  =  Es'ili. 


(li) 


2 


where  R2  is  the  full  correlation  matrix  of  the  N  object  classes,  a4  denotes  the  selection  of 
coefficients  for  filter  i,  and  m  contains  all  zeroes  with  a  single  1  in  location  i.  The 
extension  to  intra-class  MOF  SDFs  h.(x,y)  follows  directly.  The  filter  function  is  now  a 
sum  over  all  N.N  training  set  images  (Nj  images  per  class  and  N  object  classes).  The 
coefficients  of  the  filter  hj  are 


% 


(12) 


where  is  the  full  NjN  x  NjN  correlation  matrix  and  m  contains  all  zeroes  except  for 
Nj  ones  in  the  locations  corresponding  to  the  class  j  training  set  images. 

Another  SDF  that  achieves  inter-class  discrimination  and  intra-class  recognition  is  the 
multi-level  nonredundant  filter  (NRF)  SDF.  In  this  filter,  the  correlation  output  is 
allowed  to  assume  different  levels, 


(13) 


where  the  value  n  of  the  correlation  output  determines  the  output  class  n.  Synthesis  of 
the  simple  SDF  to  satisfy  (13)  is  defined  by 


a  =  R/1un, 

—  -Hi  — U 


(14) 


where  u^  =  [1,...,1,2,...,2,...J.  To  retain  binary  valued  outputs,  we  can  employ  a  K- 
tuple  NRF  SDF.  For  M-object  classes,  we  require  K  filters,  where  2K  >  M.  For  the 
four-class  case,  K  =  2  and  the  two  SDFs  hj  and  h2  are  defined  by  the  truth  table 


The  solution  for  the  filters  is  given  by  the  solution  of 


Rj-lab]  —  [u,  n2|, 


(17) 


where  [uj  u2]  is  the  full  vector  extension  of  the  right-hand  side  of  (15). 

The  synthesis  of  all  five  types  of  SDFs  described  above  is  quite  similar  and  other 
variants  are  obvious.  Many  test  results  have  been  obtained  with  these  parallel 
algorithms  and  architectures  on  our  ship  image  database  ’  and  on  other  military 
objects?0  Excellent  results  have  been  obtained  (over  00%  correct  object  classification, 
even  in  the  presence  of  noise  and  real-world  clutter)  in  all  cases.  This  represents  the 
most  attractive  and  promising  technique  for  utilization  of  the  full  potential  and  parallel 
processing  possible  with  coherent  optical  pattern  recognition  architectures.  It  is  a 
practical  and  efficient  processor  and  it  achieves  very  high  effective  computation  and 
image  frame  rates. 
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5.  OPTICAL  LINEAR  ALGEBRA  SYSTOLIC  PROCESSORS 

The  optical  processors  described  thusfar  are  quite  powerful  and  appropriate  for  the 
parallel  realization  of  various  pattern  recognition  algorithms.  The  most  intense  topic  in 
optical  computing  research  at  present  is  optica]  linear  algebra  processors.21  These 
architectures  provide  the  basic  framework  for  a  general-purpose  optical  processor 
capable  of  matrix-vector  operations.  This  concept  in  parallel  optical  processing  is  the 
equivalent  of  the  digital  array  processor  in  which  arrays  of  data  (matrices)  are  operated 
on  in  parallel. 

The  optical  system  of  Figure  7  is  one  example  of  such  a  processor.  In  this  system  an 
array  of  input  point  modulators  is  imaged  through  separate  regions  of  an  acousto-optic 
(AO)  cell.  With  the  input  data  representing  a  vector  and  the  contents  of  the  AO  cell 
being  a  matrix  (N  vectors  each  on  a  separate  temporal  input  carrier),  the  light  leaving 
the  AO  cell  is  the  product  of  the  input  vector  and  matrix.  The  output  lens  iorms  the 
sum  of  each  vector  product  by  spatial  integration  and  the  matrix-vector  product 
appears  on  the  linear  detector  array  in  parallel. 


Various  realizations  of  this  processor  are  detailed  elsewhere.22  By  frequency,  time 
and  space-multiplexing,  format  control  of  the  inputs  to  the  system  can  be  used  to  achieve 
all  of  the  fundamental  operations  in  linear  algebra.  This  flexible  and  general-purpose 
processor  can  achieve  in  excess  of  10  GOPs  per  second.  Alternate  architectures  (with 
multi-channel  AO  cells)  allow  digital  accuracy  (32  bit)  processing  to  be  achieved  with 
this  processor  at  comparable  data  rates. 
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FIGURE  7  Schematic  diagram  of  a  general-purpose  optical  linear 
algebra  processor 


16 


REFERENCES 

1.  J.  Upatnieks,  ‘Portable  Real-Time  Coherent  Optical  Correlator",  Applied 
Optics,  Vol.  22,  p.  279S-2803  (September  1983). 

2.  J.G.  Duthie  and  J.  Upatnieks,  ‘Compact  Real-Time  Coherent  Optical 
Correlators",  Optical  Engineering,  Vol.  23,  pp.  7-11  (January /February 
1984). 

3.  G.G.  Lendaris  and  G.L.  Stanley,  ‘Diffraction-Pattern  Sampling  for 
Automatic  Pattern  Recognition",  Proc.  IEEE,  Vol.  68,  pp.  193-218 
(February  1979). 

4.  H.  Kasdan  and  D.  Mead,  ‘Out  of  the  Laboratory  and  into  the  Factory: 
Optical  Computing  Comes  of  Age",  Proc.  KOSD,  pp.  248-256  (1975). 

5.  S.  Watanabe,  4th  Prague  Conference  on  Information  Theory  (1065). 

6.  K.  Fukunaga  and  W.L.G.  Koontz,  "Application  of  the  Ivarhunen-Loeve 
Expansion  to  Feature  Selection  and  Ordering",  FEES.  Trans.  Comp.,  Voi. 
C-10,  p.  311-317  (April  1970). 

7.  D.II.  Foley  and  J.W.  Sarcmon,  "An  Optimal  Set  of  Discriminant  Vectors", 
IEEE,  Trans.  Comp.,  Vol.  C-24,  pp.  381-380  (March  1975). 

8.  D.  Casasent  and  V.  Sharma,  "Feature  Extractors  for  Distortion-Invariant 
Robot  Vision",  Opt.  Engr£..  Vol.  23  (November  1084). 

9.  G.  Tenery,  "A  Pattern  Recognition  Function  of  Integral  Geometry",  IEEE 
Trans.  Mil.  Electron.,  Vol.  ME-7,  pp.  196-199  (1963). 

10.  D.J.H.  Moore  and  D  J.  Parker,  "Analysis  of  Global  Pattern  Features", 
Pattern  Recognition,  Vol.  6,  pp.  149-164  (1974). 

11.  D.  Casasent  and  W-T.  Chang,  "Generalized  Chord  Transformation  for 
Distortion-Invariant  Optical  Pattern  Recognition",  Applied  Optics.  Vol. 
22,  pp.  2087-2094  (July  1983). 

12.  D.  Casasent,  R.L.  Cheatham  and  D.  Fetierly,  "Optical  System  to  Compute 
Intensity  Moments:  Design",  Applied  Optlc3,  Vol.  21,  pp.  3292-3208 
(September  1982). 

13.  G.J.  Agin,  "Computer  Vision  Systems  for  Industrial  Inspection  and 
Assembly",  Computer,  Vol.  13,  pp.  11-20  (May  1980). 


14.  D.  Casasent  and  R.L.  Cheatham,  'Hierarchical  Pattern  Recognition  Using 
Parallel  Feature  Extraction',  Proc.  Am.  Soc.  Mech.  Engra.  Inti 
Computers  in  Enrineorlng  Conference  and  Exhibit*  (August  1684). 

15.  D.  Casasent  and  R.L.  Cheatham,  'Hierarchical  Fisher  and  Moment-Based 
Pattern  Recognition",  Proc.  Soc.  Pboto-Qpt.  Instr.  Engra.,  Vol.  604 
(August  1684). 

"16.  D.  Casasent,  'Unified  Synthetic  Discriminant  Function  Computational 
Formulation',  Applied  Optics,  Vol.  23,  pp.  1620-1627  (May  1084). 

17.  D.  Casasent  and  V.  Sharma,  'Shift-Invariant  and  Distortion-Invariant 
Object  Recognition',  Proc.  Soc.  Pboto-Opt.  Ir>3tr.  Bngrs..  Vol.  442, 
pp.  47-55  (August  1983). 

18.  D.  Casasent,  W.  Rozzi  and  D.  Fetterly,  'Deterministic  Synthetic 
Discriminant  Functions:  Performance',  Optical  Engineering.  Vol.  23 
(November  1984). 

19.  D.  Casasent,  W-T.  Chang  and  D.  Fetterly,  'Deterministic  Synthetic 
Discriminant  Functions  for  Pattern  Recognition",  Proc.  Soc.  Pkoto-Opt. 
Instr.  Engrs.,  Vol.  607  (August  1984). 

20.  D.  Casasent  and  R.L.  Cheatham.  'Optical  Moment-Based  Feature 
Extraction:  Real  Image  Test  Results',  Optics  Communications  (Submitted 
April  1984). 

21.  D.  Casasent,  "Acousto-Optic  Linear  Algebra  Processors:  Architectures, 
Algorithms  and  Applications",  Proc,  IEEE,  Vol.  72  (July  1984). 

22.  Proc.  IEEE,  Special  Issue  on  Optical  Computing,  Vol.  72  (July  1984). 


oPie^oi-.qMq 

Cmftpioee. 

fOOv 


FOURIER-TRANSFORM  FEATURE-SPACE  STUDIES 

David  Casasent  and  Vinod  Sharma 

Carnegie-Mellon  University 
Department  of  Electrical  and  Computer  Engineering 
Pittsburgh,  Pennsylvania  15213 


ABSTRACT 


A  hierarchial  multi-level  feature-space  pattern  recognition  system  is  described.  Multi¬ 
class  distortion-invariant  object  identification  is  the  purpose  of  this  study.  Attention  is 
given  to  dimensionality  reduction  (to  simplify  computations)  and  to  the  use  of  non-unitary 
transformations  (to  achieve  discrimination) .  A  Fourier  transform  feature  space  is  used. 
However,  our  basic  hierarchial  concepts,  our  theoretical  analysis,  and  our  general  conclu¬ 
sions  are  applicable  to  other  feature  spaces.  The  use  of  intensity  versus  phase  features  is 
studied  and  the  performance  of  our  system  in  the  presence  of  noise  is  studied.  Quantitative 
experimental  data  on  2  two-class  pattern  recognition  databases  are  provided. 

1.  INTRODUCTION 

Distortion-invariant  multi-class  pattern  recognition  is  considered  using  a  feature  space. 
Feature  extraction,  dimensionality  reduction,  discrimination  and  classification  are  acklzessed. 

A  simplified  block  diagram  of  our  hierarchial  pattern  recognition  system  is  shown  in  Figure 
1.  We  begin  with  a  Fourier  transform  feature  space,  since  such  a  representation  is  well- 
known  II]  to  allow  significant  data  compression.  We  extract  the  amplitude,  phase  or  both 
from  the  Fourier  transform  plane.  As  the  first  dimensionality  reduction  technique,  we  wedge 
ring  detector  (WRD)  sample  the  Fourier  transform  plane  data  [2] .  This  reduces  the  dimen¬ 
sionality  of  the  feature  space  to  64.  Next,  we  compute  the  dominant  eigenvectors  of  the  WRD- 
sampled  autocorrelation  matrix.  This  reduced  subspace  is  calculated  using  a  Karhunen-Loeve 
(K-L)  transformation  [3]  or  implemented  by  new  efficient  techniques  [4]  for  computing  the 
dominant  eigenvectors  and  eigenvalues  of  a  large  matrix.  This  completes  the  dimensionality 
reduction  steps  in  our  system.  To  provide  discrimination,  we  employ  two  non-unitary  trans¬ 
formation:  the  Fukunaga-Koontz  (F-K)  [5]  and  the  Foley-Sammon  (F-S)  [6].  Our  classifier 

selects  the  best  subspace  (based  on  the  probability  of  error)  from  the  K-L,  F-K  and  F-S 
feature  vectors. 
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FIGURE  1.  General  Fourier  Transform  (etc.)  Feature-Extraction 
Pattern  Recognition  System  Block  Diagram. 


In  Section  2,  we  review  and  highlight  our  two  levels  of  dimensionality  reduction  and  we 
discuss  how  we  achieve  distortion-invariance.  Our  two  discrimination  algorithms  are 
detailed  in  Section  3.  Brief  theoretical  remarks  on  the  use  of  Fourier  transform  plane  in-, 
tensity  or  amplitude  features  and  on  the  noise  performance  of  a  feature  extractor  are 
then  advanced  in  Section  4.  The  database  used  and  our  five  image  tests  on  dominant  eigen- 
image  feature  vectors  are  summarized  in  Section  5.  Our  more  extensive  25  image  per  class 
database  tests  are  presented  in  Section  6.  These  results  include  a  comparison  of  the  per¬ 
formance  of  our  system  for  five  different  discrimination  vectors,  comparison  of  the  perform¬ 
ance  of  amplitude  and  phase  Fourier  transform  features,  and  a  comparison  of  the  classifiers 
and  feature  extractors  in  the  presence  of  noise. 


2.  DIMENSIONALITY  REDUCTION  AND  DISTORTION-INVARIANCE 

If  the  input  image  or  object  is  256  x  256  pixels,  its  dimensionality  is  n  «  2562.  The 
Fourier  transform  plane  for  such  an  object  still  has  a  dimensionality  of  n.  This  is  quite 
prohibitive  for  subsequent  feature-extraction,  matrix  transformations,  or  other  similar 
operations.  Thus,  we  consider  dimensionality  reduction  techniques. 

As  the  first  level  of  dimensionality  reduction,  we  sample  the  Fourier  transform  plane  with 
a  WRD.  If  an  optical  system  is  used  to  produce  the  Fourier  transform,  a  commercial  WRD  de¬ 
tector  exists  (2) .  This  units  consists  of  32  wedge-shaped  detector  elements  in  one-half  of 
a  circular  detector  and  32  annular-shaped  detector  elements  in  the  other  half  of  the  detector 
plane.  This  device  thus  provides  64  WRD  outputs.  One  can  also  digitally  model  such  a  de¬ 
vice,  of  course.  The  ring  detector  elements  provide  rotation-invariance,  whereas  the  wedge 
detector  elements  provide  scale-invariance  (if  the  values  of  the  wedge-ring  detector  element 
readings  are  properly  normalized  for  object  energy).  This  WRD-sampling ,  plus  the  training  of 
our  system  on  different  distorted  images  provides  distortion-invariance  to  our  algorithm. 

The  WRD-sampling  also  provides  a  dimensionality  reduction  from  n  to  64,  i.e.  the  Fourier 
transform  plane  feature  vectors  (Xi")^.  and  (yi")i=i  are  converted  to  WRD  feature  vectors 
{xi,}f!i  and  (yi*)**! 

As  the  second  level  of  dimensionality  reduction,  we  apply  a  K-L  transformation  [3]  to  the 
autocorrelation  matrix  formed  from  the  WRD  feature  vectors  for  each  separate  object  class. 

The  autocorrelation  matrix  is  formed  from  the  64  element  Xj*  vectors  for  each  of  the  training 
set  images  x  in  class  one  and  a  second  matrix  is  formed  from  the  corresponding  y  '  vectors  of 
images  in  class  two.  The  eigenvalues  and  eigenvectors  of  each  matrix  are  calculated  and 
tabulated.  We  retain  the  dominant  nx  and  iy  eigenimages  per  class.  In  general,  nxs ny=3 , 2 , 3 . 
In  our  experiments,  we  retained  only  the  dominant  eigenimage  for  each  class. 

To  use  these  dominant  eigenimages  for  pattern  recognition  and  classification,  we  would 
compute  z^'  (i=l . . .64)  for  an  unknown  input  z,  project  it  onto  the  dominant  eigenimages  or 
eigenvectors  KL-1  and  KL-2  (for  class  one  and  two  respectively) ,  and  select  the  class  of  the 
unknown  input  based  upon  which  projection  value  is  larger.  In  practice,  we  calculate  the 
dominant  eigenvectors  using  newer  and  more  efficient  algorithms  [4], 


3.  NON-UNITARY  TRANSFORMATIONS 


The  K-L  or  dominant  eigenvector  transformation  (Section  2)  represents  a  considerable  com¬ 
pression  of  data  and  simplifies  feature  extraction  and  classifier  decisions.  The  dominant 
eigenvectors  represent  each  class  well  in  the  optimally  compressed  manner,  however  there  is 
no  assurance  that  those  features  which  represent  each  class  well  will  be  optimal  for  dis¬ 
criminating  one  class  from  another.  Thus,  dominant  eigenimages  are  useful  for  intra-class 
pattern  recognition,  but  not  necessarily  for  inter-class  discrimination.  In  a  hyperspace 
feature  vector  and  discriminant  vector  description,  unitary  transformations  do  not  change 
the  distances  of  points  or  vectors  in  hyperspace.  To  achieve  discrimination  or  inter-class 
pattern  recognition,  non-unitary  transformations  represent  an  attractive  approach.  These 
transformations  can  increase  interclass  distances  and  hence  provide  improved  discrimination. 

We  pursue  this  approach  rather  than  employing  more  eigenvectors,  since  the  latter  approach 
would  only  further  increase  the  dimensionality  and  computational  complexity  of  our  proces¬ 
sing  . 

F-K  Transformation 


The  first  non-unitary  transformation  we  consider  is  the  F-K  transformation  (5).  To 
describe  the  steps  in  this  algorithm,  we  first  define  as  the  a'priori  probability  for 
class  i  and  Ri'  as  the  autocorrelation  matrix  for  class  l.  We  form  the  autocorrelation 
matrices  Rj  and  Rj  for  each  class,  where  R^  *  PiRi'»  and  we  form  the  full  autocorrelation 
matrix  R  *  Rj+R2.  We  then  determine  the  transformation  matrix  T  that  diagonalizes  R,  i.e. 

TRTT  «  T (Rj  +  R2>TT  =  I.  (1) 

By  this  transformation,  we  have  orthogonally  decomposed  the  full  Rj+R2  •  Next,  we  apply 
T  to  Ri  and  Rj,  i.e.,  we  form  new  matrices  for  each  class  given  by  TR^TT  and  TRjT'-. 


These  new  correlation  matrices  have  two  attractive  features: 

(a)  The  eigenvectors  *i***  and  Dj*2*  of  TRjTt  and  TRjT7  are  equal 

(b)  The  eigenvalues  X^  ***  and  x^  *2*  associated  with  Vi***  and  v-i*^*  are  related  by 
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(2) 


X^1'  -  l-X^2’ 


From  (2) ,  we  see  that  the  dominant  eigenvectors  of  the  transformed  class  one  matrix  are  the 
least-dominant  eigenvectors  for  the  transformed  class  two  matrix.  Thus,  those  eigenvectors 
which  represent  class  one  the  least  represent  class  two  the  best  (in  the  new  F-K  transformed 
feature  space).  From  (2),  these  operations  have  separated  the  data  in  the  two  classes. 

Thus,  we  will  select  two  ^  with  the  largest  -  0.5j  values.  We  will  denote  these 

two  eigenvectors  (as  defined  above)  of  the  transformed  autocorrelation  matrices  by 
FK-1  and  FK-2.  To  use  these  new  discriminant  vectors  to  determine  the  class  of  an  unknown 
input  image  z,  we  transform  z  to  a  new  Tz=  z'  .  This  transforms  the  data  input  to  the  new 
FK  space.  We  then  project  zT  onto  an  FK  discriminant  vector  and  calculate  ’  *  d.  If 
d  >  our  threshold,  we  select  class  one  or  class  two  for  the  class  of  the  input  object.  We 
normalize  the  FK  eigenvectors  and  refer  to  the  projections  onto  the  FK  directions  )  and  2 
(corresponding  to  FK-1  and  FK-2) .  We  note  that  FK-1  and  FK-2  do  not  refer  to  discriminant 
vectors  for  classes  one  and  two,  rather  they  refer  to  the  two  most  dominant  eigenvectors  of 
the  transformed  full  autocorrelation  matrix  of  both  classes. 


Foley-Sanmon  Transformation 

In  the  F-S  nonunitary  transformation  [6] ,  we  find  a  linear  discriminant  vector  w  that  is 
a  linear  combination  of  the  x^  and  y^  vectors  in  our  two-class  training  set.  The  vector  w 
is  selected  to  maximize  the  Fisher  ratio  (7): 


2 

_ .  .  (Dif  of  Means  of  Projections) 
—  *  Sum  of  Scatter  of  Projections 

1 2  T 

ml  ~  m2  —  — B- 

“  J  2  2”  =  ~T  ’ 

S1  +  s2  -  hl- 


(3) 


where  Sg  is  the  between-class  scatter  matrix  and  Sw  is  the  within-class  scatter  matrix  [7]. 
The  solution  for  w  that  maximizes  (3)  is 

w  «  Sj1(m1  -  m2)  ,  (4) 


where  mj  and  m?  are  the  vector  means  of  the  two  classes.  To  use  w  for  an  unknown  input  z, 
we  form  wT_z  *=  d  and  compare  the  projection  value  to  the  threshold  T 


T  -  (m1  +  m2)/2.  (5) 

If  d  J  T,  we  select  class  one  or  class  two  for  the  class  of  the  unknown  input  image  repre¬ 
sented  by  the  vector  £. 

4.  INTENSITY  OR  PHASE  FOURIER  TRANSFORM  FEATURES 

One  particular  aspect  of  our  Fourier  transform  feature-space  study  is  to  determine  if  the 
intensity  or  the  phase  of  the  Fourier  transform  features  provides  better  performance.  As 
the  basic  theoretical  justification  for  the  performance  and  use  of  our  algorithm,  we  repre¬ 
sent  the  intensity  or  phase  of  the  wedge  ring  detected  Fourier  transform  output  for  an  image 
as  a  random  process.  We  have  extensively  investigated  the  theoretical  basis  for  this  and 
the  conditions  to  be  satisfied  for  the  resultant  (xj)  and  (yj.)  features  to  be  validly  repre¬ 
sented  as  random  vectors.  We  have  shown  that  the  Fourier  transform  of  an  analog  or  discrete 
image  can  be  represented  by  a  random  process  (x(t)  )  if:  (x(t()  is  separable,  has  finite 
expected  value,  is  continuous  in  the  mean  and  in  probability.  Under  these  conditions,  the 
resultant  analog  or  digital  Fourier  transform  is  an  n-dimensional  random  vector.  We  have 
also  shown  that  the  Fourier  transform  intensity  and  phase  are  continuous  and  yield  real 
random  vectors.  Finally,  the  WRD-sampled  intensity  and  phase  Fourier  transform  features  are 
found  to  be  randan  variables  (since  the  sum  of  random  variables  is  a  random  variable) . 


Considerable  work  [8,9]  exists  on  the  representation  of  image  data  by  the  intensity  or 
phase  of  the  Fourier  transform.  In  general,  the  conditions  under  which  the  Fourier  trans¬ 
form  phase  features  are  adequate  is  less  restrictive  than  the  conditions  for  which  the 
Fourier  transform  magnitude  features  are  adequate.  If  the  zeroes  of  the  z-transform  of  a 
sequence  occuring  in  reciprocal  pairs  lie  on  the  unit  circle,  the  phase  of  the  FT  is  ade¬ 
quate.  The  intensity  of  the  FT  is  adequate  if  the  z-transform  does  not  contain  reciprocal 
pole-zero  pairs,  poles  outside  the  unit  circle,  or  zeroes  inside  the  unit  circle. 
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5 .  DATABASES  AND  INITIAL  RESULTS 


The  four  image  databases  used  are  summarized  in  Table  1.  They  include  scaled  and  rotated 
images  of  the  letters  A  and  B  and  of  hand-drawn  tanks  and  trucks.  For  each  of  these  two  ob¬ 
ject  classes,  we  used  a  set  of  five  images  per  class  and  a  set  of  25  images  per  class.  Vari¬ 
ous  scaled  and  rotated  views  were  included  in  each  of  these  image  sets.  The  specific  dis¬ 
torted  object  views  included  in  each  case  are  detailed  in  Table  1. 


TABLE  1.  Summary  of  Experimental  Image  Databases  Used 


test 

SETS 

fivMmage  OaTa  base 

25- IMAGE  DATA  BASE 

■  scaurs  rotations 

SUAlTs  ROTATIONS 

A  and  B 

0.9, 1.0, 1.1  (f0r  0.9, l!l  scales) 

0.8, 0.9, 1.0,  tl0°,t5°,0o 

1.1, 1.2  (for  each  scale) 

Hand-Drawn 

Tank/Truck 

0.9, 1.0, 1.1  (for  o. scales) 

0.8, 0.9, 1.0,  il0V5\0° 

1.1, 1.2  (for  each  scale) 

In  Table  2,  we  list  all  of  the  eigenvalues  for  the  dominant  eigenvectors  for  the  five- 
image  database  for  all  four  object  types  and  for  both  intensity  and  phase  Fourier  transform 
features.  As  seen,  the  dominant  eigenvectors  for  intensity  FT  features  is  approximately  70 
times  the  second  dominant  eigenvector.  Using  phase  FT  features,  the  dominant  eigenvector  is 
considerably  less  dominant  (in  general).  The  eigenvalue  for  the  dominant  eigenvector  for  A 
obtained  from  FT  phase  data  is  exceptionally  low  (0.67).  From  the  low  (0.67)  eigenvalue 
associated  with  the  dominant  eigenvector  for  Fourier  transform  phase  features  for  the  letter 
A,  we  expect  low  projection  values  and  hence  more  errors  in  our  pattern  recognition  of  let¬ 
ters  using  phase  features.  In  general,  the  dominance  of  the  eigenimage  in  this  data  can  be 
attributed  to  the  fact  that  the  image  database  consists  of  scaled  and  rotated  (in-plane  ro¬ 
tation)  images  rather  than  different  aspect  views  of  each  object.  In  such  distorted  images, 
there  is  no  appreciable  new  information  present  in  each  object  representation  in  our  data¬ 
bases  investigated. 


TABLE  2.  Eigenvalues  (e-v)  of  WRD  Fourier  Transform 
Eigenvectors  (Five-Image  Databases) . 
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1 
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e-v  1 

e-v  2 

e-v  3 

e-v  4 

e-v  5 

0.983  0.992 

0.17X10*1  0.78xl0'2 
0.82xl0*4  0.28xl0'3 
0.81xl0'5  0.115xl0*3 
0.49xl0*6  O.llxlO*4 

0.983  0.886 

0.166X10'1  0. 983x10* 1 
0.21xl0*3  0.117X10*1 
0.64xl0*4  0.236xl0*2 
O.llxlO*5  0.138xl0*Z 

0.99  0.67 

0.71xl0*Z  0.24 

0.84xl0*4  0. 72x10* 1 
0.47xl0*4  O.llxlO*1 
0.65xl0*5  0.38xl0*Z 

0.99  0.95 

0. 13x10* 1  0. 43x10* 1 

0.49xl0*3  0.186xl0*2 
0.28xl0'4  0.77x10* 3 

0. 17xl0*4  0.70xl0*4 

The  projections  of  all  five  images  per  class  on  the  dominant  eigenimage  for  each  class 
were  tabulated.  Our  results  show  that  the  projections  of  all  images  in  one  class  on  the 
dominant  eigenimage  for  that  class  were  larger  than  the  projections  on  the  dominant  eigen¬ 
image  of  the  other  class.  However,  the  differences  (for  intensity  FT  features)  were  quite 
small  (e.g.  0.99  versus  0.96).  The  projections  on  the  second  dominant  eigenimages  (for  in¬ 
tensity  FT  features)  gave  lower  projection  values  than  those  on  the  dominant  eigenvector. 

In  many  cases,  larger  projections  were  obtained  for  the  wrong  images.  Thus,  for  intensity 
FT  features,  the  second  dominant  eigenvector  should  not  be  included.  These  results  support 
our  earlier  observation  that  the  eigenvectors  generally  provide  intra-class  recognition 
rather  than  inter-class  discrimination.  A  comparison  was  also  made  of  the  dominant  eigen¬ 
images  obtained  from  the  intensity-only  and  phase-only  WRD  Fourier  transform  features.  The 
phase  features  provided  larger  differences  of  the  projections  onto  the  dominant  eigenimages 
of  each  class  (on  the  average)  for  the  five-image  tank  versus  truck  images.  However,  for  the 
five-image  letter  (A  and  B)  database,  phase  features  gave  many  errors.  This  was  expected  and  is  at¬ 
tributed  to  the  anall  eigenvalue  associated  with  the  dcmnant  eigenvector  for  A.  Had  we  retained  two  dominant 
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eigenvectors,  better  phase  feature  performance  for  the  case  of  letters  coule  be  expected. 

6. _ INITIAL  EXPERIMENTAL  RESULTS 


Non-Unitary  Transformations 


All  of  the  results  included  in  this  section  were  obtained  on  our  more  extensive  database 
of  25  object  images  per  class.  In  Figure  2,  we  show  the  scatter  plots  for  the  projections 
of  all  tank  and  truck  images  onto  the  dominant  eigenimages  for  tanks  and  for  trucks  respec¬ 
tively.  As  seen,  all  images  can  be  separated  and  correctly  classified  from  either  projec¬ 
tion  alone.  However,  all  projection  values  (even  those  on  the  dominant  eigenimage  of  the 
other  class)  are  quite  large  (all  projection  values  are  above  0.95) .  Only  five  points  (X) 
are  shown  for  the  25  training  images  corresponding  to  different  scaled  and  rotated  truck 
images.  All  five  rotated  images  for  each  scale  factor  a  yielded  identical  projection  values. 
This  verifies  the  good  rotation  -  invariance  of  our  WRD  features  and  our  training  set  used. 
The  variation  in  the  projection  values  due  to  scale  differences  can  be  attributed  to  the 
normalization  technique  used  (each  eigenimage  was  normalized  only  within  one  class)  and  to 
the  significantly  larger  number  of  pixels  present  in  the  tank  object  compared  to  the  truck 
object  (800  versus  280  pixels) . 


In  Figures  3  and  4,  similar  data  are  shown  for  the  projections  onto  the  two  dominant  FK 
discriminant  vectors  (Figure  3)  and  onto  the  best  FS  vector  (Figure  4) .  These  data  in  Fig¬ 
ures  2-4  were  obtained  from  intensity-only  WRD  Fourier  transform  features.  Note  the  sig¬ 
nificantly  different  axes  scales  in  Figures  2-4.  To  compare  which  feature  extraction  tech¬ 
nique  (dominant  eigenvectors,  FK  vector,  FS  vector)  yields  the  best  performance,  we  computed 
a  separation  measure 


SEPARATION  DIFFERENCE  OF  MEANS  OF  PROJECTIONS  PER  CLASS 
MEASURE  “  SUM  OF  STANDARD  DEVIATIONS  PER  CLASS 


for  five  different  discriminant  vectors  for  the  tank/truck  and  A/B  image  sets.  The  results 
(for  intensity-only  WRD  Fourier  transform  features)  are  summarized  in  Table  3.  As  shown, 
both  dominant  eigenimages  (KL-1  and  KL-2)  perform  quite  well  (even  though  they  only  achieve 
intra-class  compression).  This  can  be  attributed  to  the  a'priori  existence  of  different 
wedge  and  ring  Fourier  transform  features  for  the  two  object  classes  and  to  the  distortion- 
invariance  and  lack  of  information  loss  incurred  by  wedge  ring  detection  sampling  of  the 
Fourier  transform  plane.  For  the  tank/truck  images,  the  performance  of  both  F-K  features  is 
comparable,  whereas  the  performance  of  the  F-S  vector  is  slightly  better.  For  the  case  of 
the  letters  A  and  B,  the  non-unitary  transformations  achieve  considerable  improvement  (by 
approximately  a  factor  of  2).  Thus,  in  some  cases,  non-unitary  transformations  will  improve 
performance.  The  results  are  quite  data  dependent.  These  non-unitary  transformations 
do  not  degrade  performance  and  in  general  improves  performance.  Thus,  such  feature-extrac¬ 
tion  techniques  appear  to  be  merited  in  all  instances. 


6.2  Noise  Performance 

To  further  test  and  compare  our  different  feature-extraction  approaches,  we  added  noise 
to  the  image  data,  recalculated  the  projections  and  the  associated  separation  measures.  The 
results  are  summarized  in  Table  4  for  our  25  feature  vectors,  for  five  different  amounts 
of  noise  and  for  both  databases.  For  the  tank  and  truck  image  data,  very  little  difference 
occurs  as  the  noise  level  is  varied.  This  can  be  attributed  to  the  fact  that  only  several 
wedge  and  ring  elements  dominate  the  feature  vectors.  Since  the  noise  is  evenly  distributed 
over  all  wedge  and  ring  feature  elements,  its  effect  on  the  dominant  feature  elements  is 
reduced  and  noise  has  less  of  an  effect.  For  the  case  of  letter  recognition  and  classifica¬ 
tion,  the  separation  measure  decreases  as  the  noise  level  is  increased.  This  is  the  general 
trend  we  would  expect.  The  amount  of  decrease  is  generally  the  same  for  all  five  discrimi¬ 
nation  vectors.  The  difference  in  the  case  of  letters  and  tanks/trucks  can  be  attributed  to 
the  fact  that  letters  have  more  structure  and  hence  Fourier  transform  plane  information  is 
more  uniformly  distributed  over  all  of  the  wedge  and  ring  sampling  elements.  Hence,  the 
effects  of  noise  is  more  fully  transferred  to  such  a  feature-space.  It  should  be  noted  that 
in  all  cases,  good  performance  was  obtained. 


WRD  Fourier  Transform  Phase  versus  Intensity  Features 


Similar  tests  were  performed  for  the  case  of  phase-only  WRD  Fourier  transform  features. 
For  the  tank/truck  data,  the  separation  measure  for  phase  features  was  found  to  be  better 
(by  10-651)  than  for  intensity  features.  For  the  A  and  B  letter  images,  the  phase  features 
sometimes  provided  better  separation  measures,  but  in  general  gave  worse  results.  This  can 
be  attributed  to  the  small  0.66  eigenvalue  of  the  dominant  phase  eigenvector  for  the  letter 
A.  Tests  were  also  performed  using  combined  intensity  and  phase  features.  For  the  tank/ 
truck  images,  10-90%  better  separation  measures  were  obtained  with  phase  features.  For  the 
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FIGURE  2.  IntensityChly  VK>  Fourier  Transform  Features  FIGURE  3.  Intensity-Only  WRD  Fourier  Transform 
Projected  onto  Daninant  Tank/Truck  Eigen images  (for  25  Feature  Projections  for  Tank/Truck  Images  onto  FX 
Image  Database) .  ♦j^Jcmlnant  Truck  Eigen image;  *2“  Vectors  (25  Images/Class) . 

Daninant  Tank  Eigemmage;  •  =Tank  Images;  X*Truck  Images. 
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FIGURE  4.  Projections  of  25  Image/Class  Tank/Truck 
Data  on  Best  Foley-Sanmon  Vector. 
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TABLE  3.  Separation  Measures  for  Different  Intensity 
Feature-Extraction  Techniques. 
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TABLE  4.  Noise  Performance  of  Intensity  WRD  Fourier  Transform  Features  for  Dif 
ferent  Feature  Extraction  Techniques  (Separation  Measure  Tabulated) . 
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letters  A  and  B,  the  separation  measure  was  never  significantly  better  and  often  was  sig¬ 
nificantly  worse  than  when  intensity-only  features  were  used. 

The  performance  of  phase  features  with  noise  were  also  tested.  For  the  tank/truck  images, 
the  separation  measure  decreased  as  the  noise  increased  (by  a  factor  of  0.3  to  10.0).  This 
was  quite  significant  and  worse  than  the  intensity  feature  results  which  showed  negligible 
variation  with  noise.  Similarly  large  reductions  in  the  separation  measure  (by  factors  of 
0.3  to  10.0)  were  obtained  for  the  case  of  the  letters  A  and  B  as  the  noise  was  increased. 
These  losses  were  much  larger  than  for  the  intensity  features. 

The  conclusions  reached  from  this  limited  testing  are  that  phase  features  provide  better 
separability .  Better  performance  can  be  expected  for  the  case  of  letters  if  the  second  dom¬ 
inant  eigenvector  is  included.  The  noise  performance  of  phase  features  appears  to  be  worse. 
This  can  be  attributed  to  the  more  uniform  distribution  of  Fourier  transform  phase  over  all 
of  the  WRD  features  (compared  to  the  concentration  of  the  intensity  features  in  fewer  WRD 
elements) . 

8.  SUMMARY  AND  CONCLUSIONS 

In  this  paper,  we  have  addressed  a  hierarchial  multi-level  general  feature-space  pattern 
recognition  system  for  multi-class  distortion-invariant  object  recognition.  Attention  was 
given  to  dimensionality  reduction  and  its  importance  and  success  were  demonstrated.  The 
Fourier  transform  plane  was  found  to  allow  significant  dimensionality  reduction.  Wedge  ring 
detector  Fourier  transform  sampling  and  Karhunen-Loeve  (XL)  or  dominant  eigenvector  calcula¬ 
tions  were  found  to  allow  considerable  reduction  and  compression  with  little  information 
loss.  To  provide  discrimination,  non-unitary  transformations  were  used  and  found  to  either 
improve  discrimination  (or  to  provide  negligible  loss  in  performance) .  The  Fukunaga-Koontz 
(FK)  and  Foley-Sammon  (FS>  non-unitary  transformations  were  considered.  Both  perform  com¬ 
parably,  with  the  FS  technique  being  somewhat  better. 

Quantitative  experimental  data  and  excellent  performance  were  obtained  on  various  image 
databases.  The  dominant  eigenimage  performed  quite  well  if  it  was  very  dominant.  When  it 
was  not  dominant,  non-unitary  transformations  helped  performance  considerably.  If  several 
of  the  feature  elements  are  dominant,  noise  performance  improves.  This  provides  further 
motivation  for  reducing  the  number  of  feature  elements  and  for  devising  schemes  in  which 
only  several  features  are  dominant.  Our  theoretical  contributions  on  random  vector  modeling, 
noise  performance  and  sample  matrix  calculations  are  quite  general  and  useful  in  many  other 
feature-extraction  problems.  Our  study  of  intensity  and  phase  Fourier  transform  features 
found  phase  features  to  be  preferable,  but  that  phase  features  generally  perform  poorer 
in  the  performance  of  noise  (since  they  are  more  uniformly  distributed  in  Fourier  transform 
space) . 
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ABSTRACT 

A  hierarchical  feature  extraction  pattern  recogni¬ 
tion  technique  is  described  and  experimental  test  data 
is  presented.  The  multi-level  system  estimates  the 
class  of  the  object  and  its  aspect  view  in  level  one. 

A  nonlinear  iterative  least  squares  estimator  comprises 
the  level  two  processor.  A  moment-based  feature  extrac¬ 
tor  is  used.  The  level  one  system  allows  the  classi¬ 
fier  to  use  features  that  are  jointly  Gaussian  random 
variables.  Experimental  results  on  a  set  of  pipe 
Images  are  presented. 

1 .  INTRODUCTION 


results  (Section  2.3).  To  utilize  the  JGRV  property  of 
moments,  a  new  classifier  is  required.  This  two-level 
classifier  is  summarized  in  Section  3  and  demonstrated 
for  a  new  database  of  pipe  parts  in  Section  A. 


2.  MOMENT  FEATURE  SPACE 

A  moment  feature  space  is  most  attractive  for  many 
reasons,  several  of  which  were  noted  in  Section  1.  In 
this  section,  we  expand  upon  Beveral  of  the  less- 
detailed  properties  of  moments,  especially  those  aspects 
that  apply  to  our  new  moment-based  classifier. 


Feature  extraction  is  a  major  computationally  ef¬ 
ficient  approach  to  pattern  recognition.  In  this  paper 
we  consider  the  use  of  a  moment-based  feature  extractor 
for  distortion-invariant  object  identification  and 
classification.  Moments  were  selected  as  the  feature 
space  to  be  used  because  of  four  unique  aspects  that 
these  features  exhibit: 

(a)  They  can  be  computed  in  parallel  [1]. 

(b)  They  allow  easy  correction  after  computation 
for  various  system  computational  errors  [2]. 

(c)  They  provide  position,  orientation  and  scale 
information  on  the  object  [3]. 

(d)  They  are  jolntly-Causslan  random  variables 
(JGRVs)  [A]  and  hence  allow  use  of  a  Bayesian 
classifier  [5)  and  do  not  require  a  training 
set  of  imagery. 


2.1  Distortion  Parameter  Effects 

For  our  application,  we  consider  b  •  (a.b.xo.yo.®) 
where  the  elements  of  b  contain  the  horizontal  (a)  and 
vertical  (b)  scale  of  the  object,  its  translation  (xo, 
yg)  and  ln-plane  rotation  (6)  with  respect  to  a  refer¬ 
ence  b°  vector.  Computing  m^Cb)  from  mi(bO)  for  differ¬ 
ent  distortions  described  by  b  Involves  a  simple  matrix 
multiplication.  For  intensity  changes  by  a  factor  k. 


For  scale  changes, 

£  -  (l/a)p+1(l/b)q+1m  .  (2) 

pq  pq 

For  translations. 


Our  concern  is  to  be  able  to  recognize  and  classify 
objects  in  multiple  classes  Independent  of  geometrical 
distortions  due  to  the  object's  orientation  and  view 
angle  and  to  estimate  the  distortions.  The  former  is 
needed  for  object  recognition  and  the  latter  for  object 
control  (l.e.  by  a  robot). 

In  Section  2,  we  briefly  review  how  the  moment 
vector  iy(b)  for  an  object  in  class  1  can  be  computed 
for  different  object  distortions,  described  by  b ,  from 
a  reference  mi(b°)  vector  (Section  2.1).  We  then  re¬ 
view  (Section  2.2)  the  conditions  under  which  moment 
features  are  JGRVs  and  the  simplified  classifier  that 
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Thus,  new  a  moments  are  easily  calculated  from  the  nomi¬ 
nal  m  moments  for  different  distortions. 


2.2  Moment  Statistics 

Finite  spatial-sampling  of  the  object  causes  vari¬ 
ations  in  a.  It  is  fairly  straightforward  to  show  that 
the  statistics  of  these  sampled  moments  are  good  esti¬ 
mates  of  the  true  sioments  and  that  the  moment  features 
are  JGRVs.  The  number  of  object  pixels  required  to 
satisfy  the  Gaussian  pdf  assumption  is  much  less  than 
the  number  of  object  pixels  needed  for  recognition  and 
classification.  Thus,  a  JGRV  model  for  moment  features 
is  quite  valid.  However,  this  is  only  true  for  those 
distortions  b;  specifically  the  moments  are  JGRVs  with 
respect  to  scale,  translation  and  ln-plane  rotation  8, 
but  not  for  out-of-plane  rotations  t .  Thus,  with  re¬ 
spect  to  the  distortions  in  b,  the  moments  are  JGRVs. 
Similarly,  one  cannot  devise  deterministic  linear  trans¬ 
formations  as  in  Section  2.1  for  the  variations  in  m 
with  .  Thus,  for  all  of  the  above  reasons,  different 
aspect  views  (♦)  of  each  object  must  be  considered  as 
separate  classes.  We  refer  to  these  as  view  classes  to 
distinguish  them  from  object  classes  (different  objects 
specifically) . 


2.3  JGRV  Classifier 

The  conventional  Bayesian  classifier  [6]  that 
minimizes  the  probability  of  an  incorrect  view-class  i 
estimate  can  be  used  with  conventional  assumptions 
(such  as  JGRV  features)  to  obtain  the  discriminant 
function 


FIGURE  1  Block  diagram  of  our  Two-Level  Moment- 
Based  Classifier. 


3.2  Iterative  Second-Level  Distortion-Parameter 

Estimator 

The  second-level  classifier  is  described  first  to 
provide  added  motivation  for  the  first-level-classifier. 
We  desire  to  combine  the  ease  with  which  mj(b)  can  be 
calculated  for  new  b  vectors  and  the  classifier  in  (6) 
to  estimate  b  for  the  input  object  and  to  provide  final 
estimates  of  the  object  class  and  aspect  view.  The  ba¬ 
sic  concept  is  to  vary  i  and  b  to  minimize  e^  ■  m-m^fb), 
where  m  is  the  measured  moment  vector  of  the  input  ob¬ 
ject.  The  square  error  measure  Ei  ■  where  I'* 

is  the  weighting  matrix  used.  To  minimize  E^  with  re¬ 
spect  to  b,  an  iterative  algorithm  is  used  since  m(b) 
is  a  nonlinear  function  of  t>.  The  algorithm  is  of  the 
general  form 


gj/i)  ”  (6) 

where  and  Zj  “  Z  are  the  mean  vector  and  covariance 
matrix  for  class  i.  In  most  cases,  ui  and  Ij  must  be 
estimated  from  training  sets  of  Imagery.  When  the 
measured  feature  vector  x  is  a  moment  vector  m,  only 
one  object  view  per  class  is  needed  to  measure  and 
II .  The  class  1  that  minimizes  gi(x)  is  the  best  class 
estimate.  The  discriminant  function  in  (6)  is  the 
Hahalanobls  distance.  If  ^  ■  1^  (6)  becomes  a  Euclidean 
distance  measure  or  nearest-neighbor  classifier.  This 
assumes  that  all  moments  are  Independent  and  that  the 
expected  variations  of  all  moments  are  equal. 


,k+l 


,  k  ^  k  k 
b  +  a  r  , 


(7) 


where  b*1  is  the  b  estimate  at  iteration  k  and  bk+l  is  a 
point  in  an  r-dimensional  space  a  distance  ak  in  the 
direction  rk  from  the  present  b’'.  We  expand  m^b)  in  a 
Taylor  series  about  the  present  bk  point  as 


m1(b)  -  2l(bk)  +  Jk(b-bk), 


(8) 


where  .J  is  the  Jacobian  of  mjfb)  with  respect  to  b  at 
the  k-th  iteration.  Substituting  e^  and  (8)  into  Ej, 
solving  for  the  minimum  b  (by  setting  VE^OO  *  0) ,  we 
obtain 


3 .  HEW  CLASSIFICATION  ALGORITHM 


,  k+1 


k  v  T  _i  V  - 
b  +  t (J >  I  J  ] 


1  k  T  .]  ■ 

( J  )  l  1[m-mi(b)]. 


(9) 


3.1  Overview 

To  utilize  the  classifier  in  (6),  each  view-class 
must  be  treated  as  a  separate  class  1.  For  9  objects 
and  36  aspect  views  per  object  (10‘  increments  from  a 
fixed  depression  angle),  there  are  1  ■  324  view-classes 
to  be  searched.  In  Section  3.2,  we  describe  our  second- 
level  classifier  which  solves  (6)  for  the  view-class  i 
and  the  distortion  parameter  vector  b.  Its  parameters 
are  discussed  in  Section  3.3.  To  reduce  the  number  of 
view-classes  i  to  be  searched,  a  hierarchical  first- 
level  classifier  is  used  in  which  estimates  of  the  ob¬ 
ject  class  (Section  3.4)  and  the  aspect  view  (Section 
3.5)  of  the  object  are  obtained  and  passed  to  the  sec¬ 
ond-level  classifier  where  the  final  decision  on  the 
view-class  1  and  the  distortion  parameters  b  is  made, 
and  the  confidence  of  our  estimates  are  provided.  A 
block  diagram  of  the  classifier  is  provided  in  Figure 
1. 


Eq.(9)  is  the  nonlinear  iterative  algorithm  used  in 
our  second-level  classifier  to  estimate  b.  For  each  i, 
(9)  is  repeated  and  we  calculate 

bgi  -  [gk(i,b)  -  gk_1 (i,b) ]/gk(i,b) ,  (10) 

where  gi(b)  •  Ei.  The  iterations  k  are  continued  until 
4gi  is  less  than  a  convergence  threshold  T.  The  algo¬ 
rithm  in  (9)  is  the  Gauss-Newton  formulation,  which  de¬ 
generates  to  the  Newton  algorithm  with  appropriate 
assumptions  (7). 

3.3  Second-Level  Classifier  Parameters 

The  iterative  algorithm  in  (9)  and  the  second-level 
classifier  requires  selection  of  various  parameters. 
These  are  summarized  in  Table  1  and  discussed  below. 

The  convergence  threshold  T  determines  when  the  itera-_4 
tlons  k  are  stopped  and  how  small  (10)  becomes.  T *  10 
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corresponds  to  a  difference  of  0.01Z  in  (10).  The  con¬ 
fidence  value  C  is  a  neasure  of  the  confidence  of  our 
estimates.  It  is  obtained  by  measuring  the  distances 
d{  and  d2  between  the  input  m  vector  and  the  two  closest 
■i(b)  vectors  and  defined  as  C  •  100[ 1  -  dj/d2 1 .  where 
d}  <  d2-  Calculation  of  J  is  simplified  by  evaluating 
it  for  b  with  (xo.yg)  “  (0»0)  and  (a,b)  -  (1,1),  i.e. 
assuming  the  presently  calculated  distortion  is  correct. 
This  is  equivalent  to  viewing  each  Iteration  as  an  up¬ 
date  of  the  prior  b*1  rather  than  the  initial  b®  esti¬ 
mate.  This  greatly  simplifies  calculation  of  J  and  (9) 
at  each  iteration. 


TABLE  1  Second-Level  Classifier  Parameters 


SYMBOL 

PARAMETER 

REMARKS 

T 

Convergence 

Threshold 

Typically  0.1 

C2 

Confidence  Value 

C2-  100[1  -dj/d2j 

J 

J  Calculation 

b-  [1,1,0,0]T 

I'1 

I  *  Calculation 

-1  -1  T 

l  •  I  or  Z  «  W  W 

b° 

W-s 

o 

o 

X 

<0 

( m00  ’  m00  ’  1  o  /noo '  ”“o  1  /moo ! 

Calculation  of  is  quite  difficult  since  the 
exact  Z  matrix  is  quite  ill-conditioned.  The  two 
choices  considered  in  our  system  are  ^  •  I.  and  « 

W  WT.  The  choice  ^  “  1^  weights  all  features  equally 
and  assumes  Independent  features.  In  our  first-level 
classifier,  we  calculate  a  multiclass  Fisher  projection 
matrix  W.  In  a  Fisher  space,  »  W  K“^WT,  where  K  is 
the  covariance  matrix  of  the  Fisher  features.  With  K  - 
1.  our  second  I-1  choice  is  obtained.  The  second  I'1 
estimate  contains  some  information  about  the  object 
separation  of  the  reference  set.  Initial  estimates  of 
the  distortion  parameters  in  b°  are  obtained  directly 
from  the  measured  Dpq  as  listed  in  Table  1. 

3.4  First-Level  Classifier:  Object  Class  Estimates 

To  reduce  the  number  of  view-classes  i  for  which 
the  second-level  classifier  of  (9)  and  (10)  and  Table  1 
must  be  used,  object  class  estimates  are  obtained  in 
the  first-level  classifier.  This  is  achieved  by  a 
hierarchical  classifier.  To  provide  Invariance  to  the 
distortions  in  b,  the  central  moments  ui  (normalized 
for  translation  and  scale)  for  each  mi  are  computed. 

To  select  the  best  subsets  at  each  node  in  the  hier¬ 
archical  node  tree,  a  multclass  Fisher  projection  ma¬ 
trix  W  of  size  mx  (c-I)  is  computed,  where  m  is  the 
number  of  moments  and  c  is  the  number  of  object  clas¬ 
ses.  W  is  chosen  to  maximize  the  total  between  class 
scatter  to  total  wlthln-class  scatter  (with  respect  to 
the  overall  sample  mean).  The  u  normalized  central 
momenta  are  then  transformed  into  this  multidimensional 
Fisher  space  as  new  transformed  feature  vectors  j  » 

WTu.  In  practice,  the  u  are  projected  onto  only  the 
two  dominant  Fisher  feature  vectors. 

An  example  of  such  a  multiclass  projection  for  9 
objects  (Section  4)  is  shown  in  Figure  2.  In  this 
classifier,  all  aspect  views  of  each  object  (36  for 
the  case  shown)  are  viewed  as  different  versions  of 
each  object  to  be  clustered  and  separated  from  aspect 
views  of  the  other  objects.  The  various  data  points 
in  each  cluster  correspond  to  scatter  due  to  36  aspect 
views  per  object  class.  From  this  multlclass  Fisher 
projection,  the  object  subsets  (these  may  be  ssiltiple 
classes)  that  are  best  separated  at  node  0  in  our 


hierarchical  tree  are  determined.  A  new  two-class 


Fisher  discriminant  vector  w  is  then  computed  that  op¬ 
timizes  the  Fisher  ratio  for  these  two  object  subsets. 
The  corresponding  w^u<  projections  for  the  two  class 
subsets  at  a  later  node  in  the  tree  are  shown  in  Figure 
3.  As  seen  from  this  data,  the  two  object  subsets 
(represented  by  the  symbols  0  and  1)  can  easily  be 
separated.  As  seen,  the  simple  linear  w  discriminant 
defined  by  Fisher  feature  1  for  this  node  achieves  this. 
Thus,  our  class  estimator  proceeds  by  forming  multiclass 
Fisher  projections  of  the  available  reference  imagery 
and  from  this  selecting  the  subsets  to  be  separated  at 
each  node  in  the  tree.  A  different  multiclass  case  is 
considered  at  each  node  and  a  different  two-class 
Fisher  discriminant  vector  w  is  then  calculated  for  use 
at  each  node.  These  procedures  are  performed  on  avail¬ 
able  reference  imagery  prior  to  classification  and  thus 
need  not  be  performed  in  real-time.  During  classifica¬ 
tion,  only  the  simple  vector  inner  product  wju^  must  be 
calculated  for  each  node  n.  A  confidence  Cj  -  35  for 
the  object  class  estimator  is  used  at  each  node.  If 
C}  <  35,  both  subsets  at  that  node  are  passed.  Ci  is 
similar  to  C2  but  in  Fisher  space. 

N 


FIGURE  2  Multiclass  Node-0  Fisher  Projection  for  the 
Database  of  9  Pipe  Parts. 
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FIGURE  3  Two-Class  Vector  Inner  Product  wTm  Fisher 
Projection  at  Node-2  for  the  Database  of  9 
Pipe  Parts. 
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(a)  Hose  Tee  (Class  0)  (b)  PVC  Tee  (Class  1)  (c)  PVC  Elbow  (Class  2)  (d)  Hose  Elbow  (Class  3) 

FIGURE  4  Representative  Images  of  Objects  in  the  Four  Main  Classes. 


TABLE  2 

Effect  of  First-Level  Classifier 

(9  Aspect  References  Every  40“  Used,  Tested  Against  all  324  Images) 


TEST 

NUMBER 


CONDITIONS  ON 
FIRST-LEVEL  CLASSIFIER 


FIRST-LEVEL  ENABLED 

ASPECT-RATIO  ESTIMATOR 
NOT  USED 

OBJECT-CLASS  ESTIMATOR 
NOT  USED 

FIRST-LEVEL  DISABLED 


AVERAGE  NUMBER 
OF  VIEW  CLASSES 
PASSED  TO 
SECOND-LEVEL 


PERCENT 
CORRECT 
OUT  OF  324 


TABLE  3 

Effect  of  Convergence  Threshold  on  Number  of  Iterations  Required 


TEST 

NUMBER 


ITERATION  UPDATE 
INCREASED  LINEARLY 
OVER  N  ITERATIONS 

N  -  25 
N  -  5 
N  -  25 


CONVERGENCE 

THRESHOLD 

RANGE 


T  -  10-4  -  10-1 
T  -  10'4  -  10_1 


T  -  0.5  -  1.0 


PERCENT 

AVERAGE 

CORRECT 

NUMBER  OF 

OUT  OF  324 

ITERATIONS 

98.2 

17  -  13 

98.2 

6.3  -  5.7 

98.2 

2 

TABLE  4 

Effect  of  Covariance  Matrix  Estimate  Used 


TEST 

NUMBER 


COVARIANCE 

ESTIMATE 

4  Fisher  Vectors 
2  Fisher  Vectors 
Identity 


PERCENT 

CORRECT 


3.5  First-Level  Classifier:  Aspect  Estimates 

Once  the  object  classes  (that  are  sufficiently 
acceptable  to  be  checked  further  in  the  second-level 
classifier)  have  been  selected,  those  aspect  views  of 
each  such  object  class  (that  should  be  Included  in  the 
view  classes  1  in  the  second-level  classifier)  are  es¬ 
timated  from  mpq.  The  use  of  moment  features  provides 
a  quite  convenient  method.  Specifically,  we  estimate 
the  aspect  ratio, (ratio  of  its  length  to  height)  of  the 
input  object  as  A  -  u2o/“02"  where  u20  “  ®20~ni10/mO0 
and  uq2  "  ®02""’'®$l^®00'  The  aspect  ratios  A  for  all 
reference  objects  in  the  estimated  object  class(es)  are 
calculated  and  K  -  A/A  is  formed.  The  aspect  view  with 
K  closest  to  one  and  all  aspect  views  within  a  factor 
TA  •  1.5  (the  aspect  threshold)  of  this  are  passed  as  a 
member  of  the  view  class  i  to  be  further  processed  by 
the  second-level  classifier.  The  value  A  can  also  be 
used  to  omit  object  class  estimates  (from  Section  3.4) 
with  no  aspect  ratio  in  the  proper  range.  If  a  lower 
Ta  value  (closer  to  1.0)  is  used,  the  number  of  aspect 
views  per  object  class  passed  to  the  second-level 
classifier  can  be  restricted  to  1  or  2  with  excellent 
final  classification  performance. 

4 ■  DATABASE  RESULTS 

4.1  Database 

The  new  database  used  consisted  of  four  different 
classes  of  pipe  parts  (Figure  4).  Two  different  types 
of  hose  tees,  four  different  types  of  PVC  tees  and  two 
different  types  of  PVC  elbows  were  included  (9  differ¬ 
ent  pipe  objects  in  four  classes).  512  x  512  pixel 
digitized  images  of  each  of  the  9  objects  were  obtained 
from  a  50°  depression  angle  at  10°  aspect  view  incre¬ 
ments  (36  aspect  views  per  object) .  These  324  images 
were  reduced  to  128  x  128  pixels  and  binarized. 

4.2  Hierarchical  Node  Tree 

The  multiclass  Fisher  projections  in  Figure  2 
show  the  scatter  of  the  different  pipe  parts.  From 
such  plots,  the  subsets  used  at  each  node  in  the  tree 
were  chosen.  Figure  5  shows  the  level-one  classifier 
hierarchical  node  tree.  A  two-class  Fisher  discrimi¬ 
nant  vector  is  computed  for  each  node  and  used  to  de¬ 
termine  the  subset  choice  at  each  node.  An  example  of 
the  scalar  projections  at  node  2  was  shown  in  Figure  3. 


INPUT 


CLASS  0  CLASS  3  CLASS  1  CLASS  2 

FIGURE  5  Hlearchical  Node  Tree  for  Class  Estimation 
in  the  Level-One  Classifier. 


4.3  Experimental  Results 

Our  extensive  simulation  tests  are  summarized  in 
Tables  2-4.  The  nominal  values  used  for  the  various 
classifier  parameters  were:  convergence  threshold  T  - 
10~A(  confidence  threshold  C2  *  0  (this  forces  a  deci¬ 
sion  to  be  made  for  each  input),  covariance  matrix  l  - 
2»  C j  -  35,  TA-1.S.  The  reference  set  contains  9  aspects 
per  class  at  40°  intervals.  Unless  otherwise  noted, 
these  conditions  are  used  in  each  test.  In  Table  2, 
the  overall  performance  obtained  with  and  without  the 
first-level  classifier  used  is  shown.  As  seen,  over 
97i  correct  object  classification  can  be  obtained  (tests 
1  and  2).  If  the  aspect  ratio  estimator  in  the  first- 
level  classifier  is  not  used  (test  2),  no  performance 
change  results,  however  about  2.4  times  more  view  clas¬ 
ses  i  must  be  checked  in  the  second-level  classifier. 

If  the  object-class  estimator  is  not  used  (test  3),  the 
number  of  view  classes  i  to  be  checked  in  the  second- 
level  classifier  is  3.5  times  larger  and  performance  is 
25£  poorer.  Without  the  first-level  classifier  (test 
4),  performance  is  comparable  to  test  3  but  all  81  view 
classes  (9  reference  aspects  for  each  of  9  objects) 
must  be  searched.  Thus,  the  first-level  classifier  both 
improves  performance  and  reduces  the  number  of  computa¬ 
tions  needed.  The  object-class  estimator  controls  per¬ 
formance  and  both  the  aspect  and  object-class  estimators 
reduce  computations. 

In  separate  tests,  various  reference  image  sets 
with  different  numbers  of  aspect  views  (i.e.  only  four 
aspect  views  in  one  quadrant  per  object  class)  were 
used  and  achieved  comparable  results.  In  Table  3,  the 
number  of  iterations  required  in  the  second-level  clas¬ 
sifier  is  quantified  and  the  effect  of  varying  the  con¬ 
vergence  thres'.  jld  T  is  investigated.  As  seen,  varying 
T  has  negligible  effect  on  the  percent  of  the  images 
correctly  identified.  Smaller  T  values  result  in  fewer 
iterations  required.  However,  we  Buspect  that  better  b 
estimates  will  result  if  more  Iterations  are  used.  In 
Table  4,  the  effect  of  various  covariance  matrix  I”1 
estimates  are  considered.  The  identity  matrix  is  found 
to  perform  adequately  with  only  a  few  percent  better 
accuracy  resulting  when  different  Fisher  vectors  are 
used  to  calculate  Z~  . 


5.  SUMMARY  AND  CONCLUSIONS 

A  new  classifier  using  a  moment -based  feature 
space  has  been  described.  The  second-level  classifier 
is  optimal  and  uses  the  JGRV  property  of  the  features 
with  respect  to  distortions  contained  in  b.  A  hierar¬ 
chical  first-level  classifier  was  Included  to  improve 
performance  and  reduce  the  computational  load  on  the 
second-level  classifier.  A  new  organized  procedure  for 
selecting  the  node  structure,  the  subsets  per  node  and 
the  discriminant  vector  per  node  was  advanced.  A  mul¬ 
ticlass  and  conventional  two-clsBS  Fisher  discriminant 
technique  was  advanced  to  automate  this  procedure.  Ex¬ 
perimental  verification  and  quantification  of  all  as¬ 
pects  of  both  levels  of  the  classifier  were  obtained 
for  a  pipe  part  database.  Excellent  results  were  ob¬ 
tained.  This  appears  to  be  a  most  attractive  and  viable 
feature  space  pattern  recognition  system  with  many 
unique  properties. 
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A  two-level  classifier  has  been  designed  for  use  in  a  moment-based  hybrid  optical/ digital  processor.  The  simulation  per¬ 
formance  of  this  pattern  recognition  system  using  real  IR  input  test  images  of  ships  and  reference  moments  obtained  from 
'hip  models  is  described  with  emphasis  given  to  the  preprocessing  operations  required. 


1 .  Introduction 

The  use  of  optical  processors  to  compute  image 
features  for  feature-based  pattern  recognition  has  re¬ 
cently  received  renewed  interest.  The  optically-com¬ 
puted  image  features  thus  far  considered  include 
Fourier  coefficients  [1-3] ,  chord  histogram  distribu¬ 
tions  [4,5] ,  and  geometrical  moments  [6-8] .  In  this 
paper,  a  moment -based  feature  extractor  and  classifica¬ 
tion  algorithm  for  pattern  recognition  is  considered 
(section  2)  and  its  performance  in  the  classification  of 
ship  imagery  (section  3)  is  addressed.  Specific  atten¬ 
tion  is  given  to  classification  of  real  input  imagery 
(section  5)  and  the  image  preprocessing  required  (sec¬ 
tion  4 ). 


2.  Optical  computation  of  the  geometrical  moments 

The  optical  system  considered  to  generate  the  mo¬ 
ments  of  an  input  object  [7]  consists  of  an  input  plane 
P]  (in  which  the  input  image  is  placed)  imaged  onto  a 
moment  generating  mask  at  plane  P2.  The  monomials 
xPyl  up  to  fifth-order  (p  ♦  q  <  5)  are  recorded  on  the 
P2  mask  each  spatial!'  multiplexed  using  a  different 
spatial  frequency  for  each  carrier.  The  optical  Fourier 
transform  of  the  light  distribution  leaving  P2  is  de¬ 
tected  on  21  multiple  parallel  output  detectors  in  the 
Pj  output  plane  and  contains  the  moments 


=///(x,y)xPy<?  dxdy  (1) 

of  the  Pj  input  pattern  f(x,y)  as  detailed  in  [7] . 

These  optically-generated  image  features  are  used 
as  inputs  to  a  digital  feature-based  classifier  which  then 
determines  the  object  class  and  the  orientation,  scale 
and  aspect  of  the  input  object.  The  details  of  this  clas¬ 
sifier  are  provided  elsewhere  [8]  and  are  not  germaine 
to  our  present  discussion,  however  several  remarks  on 
the  classifier  follow  for  completeness.  The  optically- 
calculated  input  moment  vector  m  is  projected  by  the 
first -level  classifier  in  the  digital  section  onto  a  multi¬ 
dimensional  Fisher  feature  space  [9] .  From  the  loca¬ 
tion  of  the  projection  vector,  initial  estimates  of  the 
input  object  class  are  made.  From  the  ratio  of  the  nor¬ 
malized  second-order  moments  /j2o  a°d  ^02> an  es,i' 
mate  of  the  aspect  ratio  or  aspect  angle  of  the  input 
object  is  made.  These  estimates  are  used  to  select  ref¬ 
erence  vectors  m,(0)  for  class  i  and  aspect  6  from  stor¬ 
age  against  which  m  is  compared.  The  final  decision 
on  the  object  class  and  the  geometrical  location  of  the 
input  object  is  made  in  a  second-level  classifier  imple¬ 
menting  a  nonlinear  least-squares  solution  as  detailed 
in  [8] ,  Our  present  concern  is  the  preprocessing  re¬ 
quired  on  real  images  before  their  moments  m  can  be 
reliably  extracted. 
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3.  Database 

As  our  reference  database  we  used  180  images  of 
five  types  of  ships  with  36  images  available  per  ship 
(at  10°  intervals  around  each  ship,  from  a  90°  depres¬ 
sion  angle).  This  reference  database  was  obtained  from 
ship  models  under  controlled  conditions.  Each  image 
contains  128  X  32  pixels  with  about  2000  pixels  cor¬ 
responding  to  the  ship  (for  the  broadside  view)  and 
less  than  200  ship  pixels  (for  the  bow  and  stern  views). 
The  moments  of  4  images  per  class  (10°,  30°,  50°  and 
80°,  where  0°  is  the  bow  view  and  90°  is  the  broadside 
view)  constituted  our  reference  m,(0)  database.  As  test 
data,  we  used  various  real  images  of  the  class  2  ship 
(the  Leahy).  A  typical  image  is  shown  in  fig.  1 .  It 
shows  the  ship  in  water  with  a  sky  and  shoreline  back¬ 
ground.  We  used  256  X  128  pixel  images  with  8  bits 
of  gray  scale  for  the  real  ships  in  our  tests.  The  hori¬ 
zon  (separating  the  water  and  the  sky  background)  is 
seen  and  the  depression  viewing  angle  for  the  real 
images  is  80°  (rather  than  90°,  as  in  the  reference 
imagery).  The  real  image  (from  bottom  to  top)  con¬ 
tains  four  regions:  (1)  water,  (2)  the  hull  of  the  ship 
and  some  water,  (3)  the  superstructure  of  the  ship  with 
a  water  background,  and  (4)  the  sky  and  shoreline  at 
the  top  of  the  image.  In  section  4,  we  detail  the  pre¬ 
processing  used  to  extract  the  ship  from  the  back¬ 
ground  and  in  section  5,  we  discuss  the  classification 
performance  obtained  on  such  imagery. 

4.  Image  preprocessing 

Feature-extraction  pattern  recognition  algorithms 
require  that  one  object  location  within  the  input  field- 


Fig.  1.  Typical  ship  test  image  (the  guided-missile  cruiser,  the 
Leahy,  ship  class  2). 


Fig.  2.  Bimodal  gray-level  histogram  of  fig.  1. 

of-view  be  extracted  before  the  features  are  computed. 
These  operations  are  most  commonly  referred  to  as 
segmentation  and  also  involve  noise  removal  and  filling 
in  of  holes  on  the  object  [10] .  Care  should  be  taken 
to  employ  only  simple  image  preprocessing  operations 
that  are  not  computationally  expensive.  Thus,  we  used 
mainly  histogram  operations  (since  they  require  only 
simple  tallies  of  image  pixel  levels)  to  aid  in  threshold 
selections.  A  wealth  of  such  methods  exist,  but  their 
specific  implementations  are  quite  problem-dependent. 
In  our  case,  we  used  context  information  (the  water  is 
below  the  ship,  the  sky  is  above  the  ship  and  the  deck 
line  and  horizon  are  nearly  horizontal  due  to  the  sen¬ 
sor  system  used)  to  greatly  simplify  the  ship  segmenta¬ 
tion.  Our  approach  is  quite  novel  in  the  techniques 
employed  to  select  separate  thresholds  for  the  differ¬ 
ent  image  regions  and  dynamically  select  these  regions 
based  on  the  scene  information.  Such  methods  are  of 
use  in  feature  extractors  for  diverse  applications. 

As  step  1 ,  we  formed  the  gray-level  histogram  of 
fig.  1  (see  fig.  2).  It  was  bimodal  as  expected  extending 
from  0  to  255  (8  bits).  A  broad  peak  exists  at  low 
pixel  values  (corresponding  to  the  water  and  noise, 
which  is  low  in  intensity  in  fig.  1 )  and  a  sharper  peak 
is  centered  at  the  high  175  pixel  level  (corresponding 
to  the  ship  and  the  sky,  whose  pixel  values  are  larger 
in  fig.  1).  A  well-defined  valley  at  pixel  level  150  exists. 
Thus,  at  step  2,  we  thresholded  the  image  at  150  (with 
all  pixel  values  below  150  set  to  zero  and  all  pixel  val¬ 
ues  above  150  set  to  one).  The  resultant  binary  image 
is  shown  in  fig.  3. 

At  step  3,  the  image  in  fig.  3  is  used  to  estimate  the 
location  of  the  four  image  regions  defined  in  section 
3.  To  achieve  this,  a  horizontal  or  row-projection 
histogram  of  fig.  3  is  formed.  This  is  a  graph  (fig.  3)  of 
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Fig  3.  Binary  version  of  fig.  1  thresholded  from  the  bimoda) 
gray  -level  histogram  of  fig.  2. 


the  number  of  pixels  with  value  equal  to  o r<*  in  each 
row  of  fig.  3.  From  fig.  4,  the  different  image  regions 
can  be  identified.  The  region  to  the  right  of  row  C 
(with  zero-valued  pixels)  is  the  water  below  the  ship. 
The  flatter  region  just  to  the  left  of  row  C  is  the  hull. 
The  region  between  row  B  and  where  the  hull  occurs 
contains  the  ship’s  superstructure  (plus  water  back¬ 
ground).  The  sky  and  shoreline  lie  in  the  region  to  the 
left  of  row  A.  Between  rows  A  and  B  is  a  transition  re¬ 
gion  between  the  sky  and  water  which  contains  the 
horizon  region  with  some  sky,  water  and  ship  super¬ 
structure.  Row  A  and  C  are  easily  defined  and  located. 
Row  B  was  located  from  the  sum  of  first  differences 
for  consecutive  row  values  as  the  inflection  point  in 
the  histogram.  These  procedures  are  all  automated  and 
require  only  simple  computations. 

In  step  4,  the  values  for  rows  A,  B  and  C  from  fig.  4 
are  used  to  extract  the  sky  only  (top  row  to  row  A) 


Number 


Fig.  4.  Horizontal  projection  histogram  of  the  binary  image 
of  fig.  3.  The  sky,  ship,  superstructure  and  water  regions  of 
the  image  are  noted. 
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Fig.  5.  Gray-level  histogram  of  the  gray-scale  image  in  fig.  1 
after  subtraction  of  the  means  of  the  sky  and  water  from  the 
appropriate  image  rows. 


and  water  only  (row  A  to  the  bottom  row)  region  of 
the  original  grayscale  image.  Specifically,  the  average 
pixel  values  in  these  two  image  regions  are  calculated. 
This  involves  only  a  simple  sum  of  the  pixel  levels  in 
the  proper  rows  of  fig.  1 .  In  step  5,  the  mean-value  of 
the  sky  and  shoreline  region  is  subtracted  from  the 
rows  above  A  in  fig.  1 ,  the  mean  value  of  the  water  re¬ 
gion  is  subtracted  from  the  rows  below  C  in  fig.  1 ,  and 
a  linear  combination  of  the  mean  of  the  water  and  sky 
is  subtracted  from  the  rows  between  A  and  B.  This 
produces  an  image  with  the  ship  pixels  on  a  positive 
bias  and  with  the  water  and  sky  regions  on  a  zero  bias. 
In  step  6,  the  gray-level  histogram  of  this  image  is 
formed.  As  shown  in  fig.  5,  it  has  an  obvious  bimoda  1 
structure  with  a  very  apparent  threshold  level  or  valley 
point  at  pixel  value  Vj. 

In  step  7,  all  pixels  in  the  image  with  gray-level  val¬ 
ues  below  Vj  in  fig.  5  are  set  to  zero.  This  removes  the 


Fig.  6.  Segmented  ship  image  produced  using  the  threshold 
level  Vj  found  from  fig.  5. 
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sky,  shoreline  and  water  and  thus  extracts  the  ship.  If 
the  gray-levels  above  l'T  are  retained,  a  gray-scale  seg¬ 
mented  ship  image  results.  If  levels  above  Vj  are  set  to 
unity,  a  binary  segmented  ship  image  results  (fig.  6). 
Simple  median  filtering  or  other  local  convolution  op¬ 
erations  can  be  used  to  suppress  miscellaneous  noise 
pixels  remaining  in  the  background  and  to  fill  in  holes 
on  the  target  object. 

5.  Image  classification 

The  moments  m  of  the  image  in  fig.  6  were  com¬ 
puted  and  fed  to  our  digital  first-level  Fisher  projection 
class  estimator.  This  first -level  classifier  omitted  class  1 
and  3  ships  as  possible  class  matches.  The  second-level 
classifier  returned  class  2  as  the  most-likely  object 
class.  This  classifier  also  provides  confidence  levels  for 
each  possible  ship  class  (classes  2, 4  and  5)  passed  by 
the  first-level  classifier.  The  class  4  ship,  another 
guided-missile  cruiser,  had  the  second-best  confidence 
but  it  was  quite  worse  than  that  of  the  best  (and  cor¬ 
rect)  class  2  match.  The  correct  aspect  angle  (70°)  and 
scale  {507c)  of  the  input  object  are  also  provided  by 
the  classifier. 


6.  Summary  and  conclusion 

A  necessary  aspect  of  feature  extractors  for  pattern 
recognition  is  the  image  preprocessing  required.  A 
novel  digital  segmentation  preprocessing  procedure  of 


quite  general  use  was  detailed  for  a  ship  pattern  recog¬ 
nition  scenario.  Such  operations  are  essential  if  optical 
or  digital  feature  extraction  processors  are  to  achieve 
good  performance.  The  successful  classification  of  a 
real  input  image  using  moment  features  and  a  unique 
two-level  classifier  was  demonstrated.  Similar  results 
were  obtained  for  other  real  images. 
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formulation 
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A  most  attractive  approach  to  distortion-invariant  pattern  recognition  uses  a  synthetic  discriminant  func¬ 
tion  (SDFI  as  the  matched  spatial  filter  in  a  correlator.  In  this  paper,  we  (1 )  provide  a  general  basis  function 
and  hyperspace  description  of  SDFs,  (2)  advance  a  derivation  showing  the  generality  of  the  correlation  ma¬ 
trix  observation  space  that  we  use  in  our  Filter  synthesis,  and  (3)  detail  a  unified  SDF  Filter  synthesis  tech¬ 
nique  for  five  different  types  of  pattern  recognition  problem. 
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I.  Introduction 

Correlators  are  well  known  to  be  powerful  systems 
and  architectures  that  can  recognize  multiple  occur¬ 
rences  of  an  object  in  the  presence  of  noise.  Optical 
systems  using  holographic  matched  spatial  filters 
(MSFs)1  easily  perform  the  correlation  function. 
However,  the  performance  of  a  correlator  rapidly  de¬ 
grades  as  distortions  are  present  in  the  input  image. 
Various  approaches  have  been  advanced  in  recent  years 
to  achieve  distortion -invariant  pattern  recognition. 
None  has  yet  demonstrated  such  performance  while 
retaining  the  shift-invariant  feature  of  a  correlator.  If 
shift-invariance  is  not  required,  a  correlation  approach 
is  still  preferable  to  feature  extraction  techniques  for 
distortion-invariant  pattern  recognition  because  the 
processing  gain  (PG)  of  a  correlator  allows  more  input 
noise  to  be  present. 

In  this  paper,  we  detail  a  generalized  method  to 
achieve  multiobject  shift-invariant  and  distortion- 
invariant  pattern  recognition  using  a  correlator.  This 
technique  uses  a  synthetic  discriminant  function  (SDF) 
to  form  the  MSF  for  use  in  a  correlator.  The  SDF 
synthesis  technique  achieves  the  distortion-invariance, 
whereas  the  use  of  an  MSF  correlator  provides  the  PG 
and  shift-invariance.  This  SDF  is  similar  to  averaged 
niters2-3  and  generalized  matched  filters.4  However, 
the  filter  synthesis  and  computational  technique  we 
use5  are  most  general.  In  Sec.  II,  we  discuss  this  pattern 


The  author  it  with  Carnegie-Mdlon  Univtrtity,  Department  of 
Electrical  It  Computer  Engineering,  Pittsburgh,  Penntylvania 
15213 

Received  21  November  1983. 

0003-6935/84/101620  08$02.00/0 
©  1984  Optical  Society  of  America. 


recognition  problem  and  SDF  synthesis  using  a  modi¬ 
fied  hyperspace  description.  In  Sec.  Ill,  we  describe  our 
filter  synthesis  in  terms  of  general  2-D  basis  functions, 
and  we  show  that  a  correlation  matrix  observation  space 
results  directly  and  yields  a  SDF  synthesis  technique. 
In  Sec.  IV,  we  detail  the  synthesis  of  five  different  types 
of  SDF  for  different  pattern  recognition  applications 
(using  our  general  filter  synthesis  description).  As  we 
show,  all  SDFs  can  be  derived  from  the  same  basic 
matrix-vector  equation. 

We  restrict  attention  to  the  use  of  a  conventional 
correlator  (modified  to  use  an  MSF  of  an  SDF).  In  such 
an  architecture,  the  positions  of  the  output  correlation 
peaks  denote  the  locations  of  the  objects  in  the  input 
field  of  view.  This  differs  from  coded-phase  proces¬ 
sors6-7  in  which  the  location  of  the  output  peak  deter¬ 
mines  the  class  of  the  input  object.  Such  processors  are 
not  shift-invariant  and  require  that  only  one  object  be 
present  in  the  input  field  of  view.  The  SDF  concept  we 
advance  can  be  viewed  as  an  extension  and  reformula¬ 
tion  of  the  use  of  correlators  with  multiple  MSFs  (one 
per  object  class)  and  multiple  correlation  outputs.  As 
noted  in  Ref.  8,  one  can  obtain  better  performance  from 
a  multichannel  correlator  by  forming  a  linear  combi¬ 
nation  of  the  multiple  correlation  outputs  (compared 
to  the  performance  that  results  if  we  simply  select  the 
single  correlation  output  with  the  largest  peak  value). 
Our  filter  synthesis  technique  forms  one  MSF  that  is  a 
linear  combination  of  the  MSFs  of  the  different  object 
classes  being  considered.  However,  we  form  this  filter 
in  the  image  plane  and  then  by  a  conjugate  Fourier 
transform  construct  the  MSF.  This  approach  might 
appear  to  differ  only  slightly  from  others.  However,  as 
we  show  (Secs.  II  and  III),  it  is  much  more  genera)  (since 
synthesis  of  a  SDF  directly  in  the  MSF  Fourier  trans¬ 
form  plane  restricts  the  basis  functions  used  to  be 
Fourier  coefficients  or  exponentials  etc.),  and  it  is  also 
much  easier  to  compute  (as  we  show  in  Sec.  III). 
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•  II.  Hyperspace  SDF  formulation 

A  hyperspace  description  of  an  SDF  is  the  most 
conventional  pattern  recognition  approach.9  This 
approach  to  SDF  descriptions  was  first  advanced  in  Ref. 
10.  Conventional  pattern  recognition  uses  a  feature 
vector  representation  in  which  each  input  image  is  de¬ 
scribed  by  its  projections  on  different  scalar  image 
features.9  Each  input  image  is  then  described  by  a 
feature  vector  in  a  hyperspace  (whose  axes  are  the  scalar 
image  features  considered  to  be  of  importance).  In  Fig. 
1,  we  show  a  typical  representation  of  how  one  might 
desire  two  classes  of  data  to  be  displayed  in  a  simplified 
two-axis  hyperspace.  Objects  in  the  two  classes  (rep¬ 
resented  by  Xs  and  Os,  respectively,  in  Fig.  1)  should 
be  widely  separated,  and  objects  within  each  class 
should  cluster  in  a  small  region  in  this  display  space.  In 
conventional  pattern  recognition,  the  basis  functions 
or  object  features  (the  axes  of  the  hyperspace)  are 
usually  scalar  features.  In  our  description,  we  consider 
the  use  of  a  hyperspace  with  2-D  spatial  basis  functions 
as  the  axes  or  object  features.  This  will  clearly  greatly 
increase  the  power  of  such  a  pattern  recognition  rep¬ 
resentation.  The  major  problem  is  the  selection  of  the 
object  features  (the  axes  of  the  hyperspace  or  equiva¬ 
lently  the  basis  functions)  to  achieve  the  desired  sepa¬ 
ration  of  different  classes  and  the  clustering  of  data 
within  each  class.  Most  techniques  that  have  been 
suggested  to  achieve  this  are  rather  ad  hoc.  However, 
our  approach  is  automatic  (as  will  be  shown). 

We  thus  consider  an  advanced  and  modified  hyper¬ 
space  in  which  the  basis  functions  are  2-D  spatial  image 
functions  rather  than  scalars.  We  retain  the  same  basic 
concepts  used  in  conventional  hyperspace  feature-space 
pattern  recognition.  For  example,  if  a  line  or  a  hyper¬ 
plane  (shown  in  Fig.  1 )  can  be  drawn  that  separates  the 
two  image  classes,  the  normal  to  this  plane  from  the 
origin  defines  the  discriminant  function  to  be  used.  In 
conventional  pattern  recognition  with  scalar  basis 
functions,  an  input  object  is  described  by  its  features, 
and  these  comprise  the  elements  of  the  feature  vector 
that  describes  the  input  object.  When  this  feature 
vector  is  projected  onto  the  discriminant  feature  vector, 
a  decision  on  the  class  of  the  input  object  is  made  (based 
upon  the  value  of  the  projection). 

In  our  modified  hyperspace  formulation,  we  retain 
the  major  element  of  conventional  pattern  recognition, 
the  concept  of  basis  functions,  discriminant  functions, 
etc.  However,  in  our  formulation,  each  of  these  now 
becomes  a  2-D  spatial  function.  Since  the  basis  func¬ 
tions  are  2-D,  so  is  the  discriminant  function  and  so  is 
the  input  image  in  our  representation.  We  can  simply 
project  the  2-D  input  image  onto  the  SDF  as  in  the 
conventional  case  of  image  and  feature  vectors.  How¬ 
ever,  the  result  will  be  valid  for  only  one  location  of  the 
object  in  the  input  field  of  view.  To  see  this,  recall  that 
our  basis  functions  are  2-D  spatial  functions;  thus  each 
shifted  version  of  an  object  corresponds  to  a  new  point 
in  our  hyperspace.  All  these  points  (for  one  object)  lie 
on  the  surface  of  a  hypersphere  (since  shifted  versions 
of  an  object  have  the  same  energy).  Clearly,  a  con- 


Fig.  I  Simplified  two-axis  hyperspace  description  of  distortion- 
invariant  multiclass  shift -invariant  pattern  recognition  using  a  feature 
vector  and  disc  riminant  vector  hyperspace  concept. 


ventional  hyperspace  description  becomes  very  com¬ 
plicated  if  shifted  versions  of  the  input  object  are  re¬ 
quired  to  be  recognized.  Thus,  in  our  modified  hy¬ 
perspace  description,  we  retain  the  simplicity  of  a  single 
vector  representation  of  an  object  and  the  definition  of 
the  discriminant  function  as  the  normal  from  the  origin 
to  the  discriminant  hypersurface  separating  regions 
containing  different  object  classes.  To  provide  shift- 
invariance,  we  correlate  (a  2-D  spatial  correlation)  the 
discriminant  function  with  the  input  image  and  use  the 
hyperspace  concept  only  to  synthesize  the  discriminant 
function  to  be  used.  Since  any  shifted  version  of  an 
object  can  be  used  to  synthesize  a  MSF  in  a  correlator, 
we  need  select  only  one  shifted  version  of  each  object 
class  in  our  hyperspace  representation  and  for  our 
discriminant  function  synthesis.  We  select  the  specific 
shifted  version  used  for  each  object  class  based  upon 
maximum  common  information  concepts  as  we  detailed 
in  Ref.  11.  A  maximum  common  information  SDF  then 
results. 

The  selection  of  the  specific  shifted  version  of  each 
object  class  to  be  used  can  often  be  simply  achieved  by 
colocating  the  centroids  of  all  the  objects.  In  specific 
cases,  small  shifts  from  the  centroid -centered  images 
are  needed  if  optimum  performance  is  to  be  achieved. 
In  general,  sufficient  performance  results  from  the  use 
of  centroid-shifted  objects  alone.1 1  Since  this  and  other 
pattern  recognition  techniques  employ  training  sets  of 
data  for  the  different  object  classes,  such  flexibility  in 
the  selection  of  the  images  used  in  the  hyperspace  de¬ 
scription  is  quite  valid  and  appropriate.  The  general 
approach  is  described  in  the  simple  system  diagram  of 
Fig.  2.  We  use  several  different  images  of  each  object 
class  for  filter  synthesis  and  to  perform  the  hyperspace 
diagram.  These  images  can  and  usually  are  different 
geometrically  distorted  views  of  each  object  class. 
These  are  referred  to  as  the  image  training  set.  They 
are  used  to  determine  the  basis  function  to  be  used,  to 
select  the  discriminant  hypersurface,  and  hence  to  de¬ 
fine  the  discriminant  function  itself.  The  training  set 
of  images  is  chosen  to  provide  a  valid  statistical  repre- 
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Fig.  2.  Simplified  block  diagram  of  the  off-line  synthetic  discrimi¬ 
nant  function  synthesis  from  training  set  data  and  the  use  of  9uch 
filters  for  on-line  correlation  of  the  SDF  with  unknown  input 
imagery. 

sentation  of  each  object  class.  The  SDF  algorithm  itself 
provides  the  discrimination  (as  we  will  detail  in  Sec. 
III). 

This  entire  filter  synthesis  operation  is  performed 
off-line  on  training  set  images.  From  such  computa¬ 
tions,  an  SDF  h(x,y)  is  produced,  which  is  then  corre¬ 
lated  with  new  input  imagery  in  a  MSF  correlator  as 
shown  in  Fig.  2.  This  new  input  imagery  is  referred  to 
as  test  imagery.  These  test  images  are  not  members  of 
the  training  set  of  images.  For  generality,  a  prepro¬ 
cessing  box  is  included  in  Fig.  2.  This  can  perform  edge 
enhancement,  median  filtering,  or  similar  operations. 
In  general,  this  preprocessing  function  can  be  omitted 
or  restricted  to  quite  simple  operations  (because  of  the 
processing  gain  of  a  correlator). 

III.  Correlation  Observation  Space 

In  this  section,  we  discuss  an  automated  technique 
to  select  the  basis  functions  and  the  SDF  described  in 
Sec.  II.  The  general  SDF  formulation  we  employ  uses 
a  correlation  observation  space.  To  justify  the  gener¬ 
ality  of  this  technique,  we  devote  this  section  to  a  deri¬ 
vation  of  it  as  the  most  general  set  of  features  to  be  used 
in  synthesizing  and  computing  a  SDF  MSF  for  use  in 
a  correlator.  Our  formulation  uses  a  general  set  of  basis 
functions  and  involves  an  automated  technique  to  select 
them.  To  develop  our  general  SDF  synthesis  tech¬ 
nique,  we  consider  N  training  set  images  of  an  object  in 
class  one.  These  N  images  can  represent  different 
distorted  versions  of  this  one  object.  For  simplicity,  we 
consider  the  synthesis  of  an  equal  correlation  peak 
(ECP)  SDF.  This  filter  function  h  (x  ,y )  has  the  prop¬ 
erty  that  the  correlation  output  of  h(xy)  and  all  images 
l/n(*.}')l  >n  class  one  equals  a  constant  (unity  is  chosen 
for  this  constant),  i.e., 

)  “  1  U) 

In  Sec.  IV,  we  extend  the  basic  algorithm  in  this  section 
to  other  pattern  ern  recognition  applications  and  other 
types  of  SDF.  When  the  different  images  |/m(x,y)|  are 
different  geometrically  distorted  versions  of  one  object 
/(x,y),  this  ECP  SDF  is  appropriate  for  an  intraclass 
pattern  recognition  problem  (recognition  of  any  dis¬ 
torted  version  of  an  object  using  a  single  filter  func¬ 
tion). 

To  develop  formally  an  algorithm  for  synthesis  of  a 
filter  function  h(x^y)  that  satisfies  Eq.  (1),  we  describe 


each  training  set  image  as  a  linear  combination  of  a  basis 
function  set  4>m(x,y),  i.e., 

fn(x.y)  m  'La„mt„(x,y)  (2) 

m 

This  follows  directly  from  our  hyperspace  description 
in  Sec.  II.  We  place  no  specific  restrictions  on  the  basis 
function  set;  i.e.,  we  do  not  assume  a  Fourier  coefficient 
basis  function  set  as  in  Ref.  4  or  the  use  of  circular  har¬ 
monics  as  in  Ref.  12.  The  desired  SDF  is  described  as 
another  linear  combination  of  the  same  basis  function 
set.  This  is  compatible  with  the  conventional  hyper¬ 
space  description  in  Fig.  1  and  Sec.  II,  extended  to  the 
case  of  2-D  basis  functions,  i.e., 

h(x.y)  =  £  bm<t>„ix,y).  (3) 

m 

Assuming  an  orthonormal  set  of  basis  functions  (as 
is  conventional  in  pattern  recognition),  we  can  substi¬ 
tute  Eq.  (2)  into  Eq.  (1)  and  rewrite  the  ECP  SDF 
condition  in  Eq.  (1)  as 

fn(xy)Qh(x,\)  «  f „  •  h  =  £.anmbm  =  ].  (41 

n 

Next  we  note  that  since  h(x,y)  is  a  linear  combination 
of  the  <fim(x,y)  and  so  is  fn(x,y),  we  can  write  h(x,y)  as 
a  linear  combination  of  the  input  training  set  of  images 
{/n(x,y)|;  i.e.,  we  first  write  out  several  of  the  terms  in 
Eq.  (3): 

hlxj)  *  fci0i(x^y)  +  b2<t>2(x,y)  +  ...«£  b„itm(i,y).  (5) 

m 

From  Eq.  (2),  we  can  write  the  basis  function  set 
4>m(Jt,y)  as  a  linear  combination  of  the  training  set  of 


images  /„(x,y)  as 

*».(x,y)  -  £  dnnfn(xy).  (6) 

n 

Substituting  Eq.  (6)  into  Eq.  (3),  we  obtain 

Mx.y)  *  6|  £  cf|„/n(x,y>  +  b2  £d2n/n(x,\  l  +  (7) 

n 

“  +  e2/2<x,.v)  +  (7a) 

(7b) 

m 


In  Eq.  (7a),  we  have  grouped  all  coefficients  of  /lt  f  2,  etc. 
together  and  have  denoted  them  by  et,  e2,  etc.  The 
final  result  in  Eqs.  (7b)  and  (3)  are  equivalent;  one  de¬ 
scribes  the  SDF  in  terms  of  the  basis  functions  [Eq.  (3)], 
and  the  other  [Eq.  (7b)[  describes  them  in  terms  of  the 
original  training  set  of  images. 

Wc  now  consider  how  to  determine  the  em  in  Eq.  (7b) 
to  satisfy  our  ECP  SDF  criteria  in  Eq.  (1)  or  (4).  For 
notational  simplicity,  we  describe  all  images  [the  SDF 
h[x,y )  and  the  training  set  images)  by  vectors  h  and  f„ 
or  fm,  respectively.  This  notation  and  description 
follow  directly  from  the  hyperspace  model  advanced  in 
Sec.  II.  We  denote  the  correlation  of  two  such  vector 
functions  by  the  vector  inner  product,  which  we  write 
simply  as  fn  •  h.  Since  we  use  a  correlator  for  our  final 
object  classification,  and  since  any  shifted  version  of  an 
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image  can  be  used  as  the  MSF  in  a  correlator,  there  is 
no  loss  of  generality  in  this  simplified  formulation. 

With  these  preliminaries,  the  EOF  SDK  requirement 
in  Eq.  (1 )  is  now  written  as 

f„  •  h  =  t  ihi 

Substituting  Eq.  (7b)  into  Eq.  (8)  for  h,  rearranging 
terms,  and  defining  rmn  as  the  elements  of  the  correla¬ 
tion  matrix  R,  Eq.  (8)  becomes 

f„  •  h  «  f.  -  ^  cmtm  =  U  cm(f„  ■  f„t  =  rmrn„  =  1.  O) 

\  m  m  m 

In  matrix-vector  form,  we  rewrite  Eq.  (9)  as 

Re  =  ii,  (10) 

where  u  denotes  the  unit  vector  and  where  the  elements 
of  the  vector  e  are  the  em  in  Eq.  (7b)  or  (9).  The  solu¬ 
tion  for  the  ECP  SDF  h(x,y)  defined  by  Eq.  (7b)  that 
satisfies  (1)  is  thus  given  by  the  solution  to  Eq.  (10), 
i.e., 

e  =  R"1^  H  1 1 

From  this  general  formulation,  we  have  shown  that 
a  correlation  matrix  observation  space  directly  results 
as  an  ideal  feature  space  from  which  to  compute  the 
required  coefficients  for  a  linear  combination  filter  such 
as  an  SDF.  We  note  that  this  formulation  used  a  gen¬ 
eral  basis  function  set  <t>m(x,y),  but  that  in  our  algorithm 
no  specific  choice  for  the  basis  function  set  was  required 
Thus,  to  synthesize  an  SDF,  we  simply  form  the  corre¬ 
lation  matrix  of  the  training  set  of  data,  invert  it,  and 
multiply  it  by  the  appropriate  vector  u  This  discri¬ 
minant  function  formulation  is  thus  automatic  and  does 
not  require  ad  hoc  selection  of  certain  basis  functions 
or  input  features.  We  first  advanced  the  fundamentals 
of  this  unified  correlation  matrix  observation  space 
description  in  Ref.  5.  This  present  description  is  a  re 
vised  and  more  general  version  of  our  original  algorithm 
for  synthesis  of  an  averaged  filter2-1  with  the  removal 
of  any  specific  requirements  or  selection  techniques  for 
the  basis  functions  used.  In  Sec.  IV,  we  develop  a 
general  formulation  along  the  general  description  in  Eq 
(11)  for  the  synthesis  of  several  different  SDFs  for  dif¬ 
ferent  pattern  recognition  applications. 

Many  techniques  exist  by  which  a  general  basis 
function  set  can  be  obtained.  In  Refs.  2, 3, 10,  and  1 1 , 
we  used  a  Gram-Schmidt  procedure  to  select  orthogonal 
basis  functions.  In  Refs,  f  and  7  a  Fukunaga-Koontz 
and  Foley-Sammon  technique  is  employed.  In  Refs. 
10  and  13,  Karhunen-Loeve  transforms  were  suggested 
for  similar  problems.  In  Ref  14,  singular  value  de¬ 
composition  techniques  were  described.  Our  present 
algorithm  can  accommodate  any  of  these  methods,  but 
by  our  new  generalized  description,  we  require  no  spe¬ 
cific  basis  function  selection.  However,  these  prior 
techniques  are  useful  as  intermediate  steps  in  per¬ 
forming  the  required  correlation  matrix  inversion  in  Eq. 
(11).  No  specific  guidelines  for  matrix  inversion  tech  • 
niques  are  advanced  in  this  present  paper,  since  we 
desire  to  retain  a  general  description.  However,  we  note 
that  if  R  is  singular,  we  employ  a  generalized  inverse. 


and  if  the  dimension  of  R  is  large,  we  use  several  new 
computationally  efficient  methods  such  as  on-line 
dominant-image  calculation1  r>  and  orthogonal-hyper- 
plane-projection  methods. ,r’ 

IV.  Generalized  SDF  Synthesis 

In  this  section,  we  describe  five  general  SDFs  and 
detail  their  synthesis  in  the  form  of  Eq.  (1 1)  and  their 
use  for  different  pattern  recognition  problems  and  ap 
plications.  The  computational  ease  with  which  these 
useful  pattern  recognition  filters  can  be  fabricated  is 
quite  attractive.  The  filters  to  be  described  include  a 
more  unified  and  general  description  (Sec.  IV. A)  of  the 
ECP  SDF  of  Sec.  Ill  (for  intraclass  pattern  recognition), 
a  mutual  orthogonal  function  (MOF)  SDF  (Sec.  IV. B) 
for  M -class  interclass  pattern  recognition,  an  MOF  SDF 
for  two-class  and  multiclass  interclass  discrimination 
as  well  as  intraclass  recognition  (Sec.  IV. C),  a  new 
multiclass  MOF  SDF  using  several  SDFs  (Sec.  IV.D), 
and  another  new  simple  nonredundant  filter  (NRF) 
SDF  for  intraclass  and  interclass  pattern  recognition 
(Sec.  IV. E). 

A  Equal  Correlation  Peak  SDFs  for  Intraclass  Pattern 
Recognition 

The  genera)  formulation  for  the  ECP  SDF  satisfying 
the  condition  in  Eq.  ( 1 )  can  be  described  (for  a  training 
set  of  A/)  images  and  an  associated  X  N\  correlation 
matrix  R] )  as 

• « Rr'ti, -  r  1 1 1 _ i|r.  (i2) 

where  the  unit  vector  u  i  has  Ni  elements  (all  of  which 
are  unity).  The  elements  of  a  are  the  weighting  coef¬ 
ficients  in  the  linear  combination  SDF: 

hit,})  *  L  On/nUvy),  (13) 

n 

where  the  |/„(x,y)l  training  set  images  associated  with 
the  correlation  matrix  R  are  different  distorted  versions 
of  the  same  object.  This  SDF  filter  h(x,y)  when  used 
in  a  correlator  is  thus  capable  of  intraclass  distortion- 
invariant  pattern  recognition  (i.e.,  recognition  of  dif¬ 
ferent  distorted  views  of  one  class  of  object).  Such  an 
ECP  SDF  yields  the  same  correlation  output  for  all 
distorted  views  of  one  object  as  required  by  Eq.  ( 1 ).  In 
other  extensions  of  this  general  SDF  synthesis  algorithm 
to  other  pattern  recognition  applications  (beyond  in¬ 
traclass  object  recognition),  we  begin  by  describing  the 
SDF  as  a  linear  combination  of  the  training  set  of  im¬ 
ages.  As  shown  in  Sec.  Ill,  such  a  formulation  emerges 
directly  from  our  hyperspace  description. 

B.  Mutual  Orthogonal  Function  SDF  for  Interclass 
Pattern  Recognition 

Next  we  consider  an  interclass  pattern  recognition 
problem  (the  discrimination  between  and  recognition 
of  M  different  objects  in  M  different  classes).  In  this 
initial  example,  we  assume  one  image  per  object  class, 
and  we  consider  only  interclass  discrimination  rather 
than  intraclass  recognition  of  distorted  versions  of  each 
object.  We  describe  our  training  set  of  M  images  (one 
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per  object  class)  by  | /„,  (x,v)|,  and  we  denote  the  M  X  M 
correlation  matrix  of  this  training  set  of  data  by  R2.  For 
this  problem,  we  desire  to  produce  M  SDFs  hm  = 
h\,hj.  .  so  that 

/, 1 1  I  '•) h,(x I  =  i,t,  (14) 

i.e.,/;(x,y)  0b,(x,v)  is  unity  only  for  filter  i  and  image 
class  j  =  i.  Thus  only  one  of  the  M  SDFs  (/im(x,y)| 
yields  an  output  of  unity,  whereas  all  the  M  -  1  other 
SDFs  yield  zero  outputs.  The  filter  with  the  unity 
output  thus  determines  the  class  of  the  input  object. 

Following  our  general  procedure  in  Secs.  Ill  and  IV. A, 
we  describe  each  of  these  M  SDFs  as  a  different 
weighted  linear  combination  of  all  M  training  set  images 
fmtx,\),  i.e., 

*!<*,>  I  =  U  amf„(x,\).  =  £  6m/m(x,v),  . 

I  =  Mmfm(X,\),  <15) 

m 

or  in  general 

Mx«>  '  *  L  Pmf«t-».yl,  (161 

where  each  summation  in  Eqs.  (15)  and  (16)  is  over  all 
M  training  set  images.  Following  our  earlier  general 
approach,  we  can  write  the  coefficients  a.  b,  etc.  in  Eq. 
(15)  that  satisfy  Eq.  (14)  as 


«R; 

=  ii„  =  11.0,0.0,. 

-.or. 

( 17a ) 

bR. 

*  =  10,1,0.0.  . 

.or. 

( 1 7b  l 

cR, 

=  u,  -  10,0.1,0, .  . 

.  .  0] r.  etc 

( 1 7c  1 

Each  um  vector  in  Eqs.  (17)  has  M  elements  and  con¬ 
tains  M  —  1  zeros  and  one  1.  The  location  of  the  single 
1  is  different  in  each  of  the  vectors.  For  u„,  the  first 
element  is  a  1;  forti*,,  the  second  element  is  a  l;etc.  The 
elements  of  the  different  vectors  a,  b,  etc.  in  Eqs.  (17) 
are  the  linear  coefficients  in  the  corresponding  SDF 
equations  in  (15).  Thus  the  M  SDFs  in  Eq.  (15)  that 
satisfy  Eq.  (14)  are  described  by 

■  *  Rt‘u0,  b  =  R-T'uf,.  c  *  RJ 'li,.  etc.  (18) 

As  seen  by  inspection  of  the  um  vectors  in  Eq.  ( 1 7),  filter 
h  1  (described  by  a)  yields  unity  output  for  the  image  j\ 
of  class  one  and  zero  for  all  other  image  classes.  Filter 
hn  (described  by  b)  yields  unity  output  for  the  image  /2 
of  class  two  and  zero  for  all  other  M  —  1  image  classes, 
etc.  This  is  as  required  by  Eq.  (14). 

Since  all  these  filter  functions  are  mutually  orthog¬ 
onal,  we  refer  to  this  type  of  SDF  as  a  mutual  orthogonal 
filter  (MOF)  SDF.  The  problem  formulation  advanced 
above  is  similar  to  that  of  the  generalized  matched  filter 
as  described  in  Ref.  4  and  the  decorrelation  matrix  filter 
synthesis  described  in  Ref.  10.  However,  our  present 
formulation  is  in  terms  of  our  general  description  using 
the  correlation  matrix  of  the  training  set  of  data. 

In  Ref.  4,  the  filter  function  was  synthesized  from 
Fourier  coefficients  of  each  object  and  is  thus  the  fre¬ 
quency-domain  filter  synthesis  dual  of  the  earlier8  use 


of  a  linear  combination  of  multiple  correlation-plane 
outputs.  Instead  of  forming  a  linear  combination  of  M 
multiple  correlation-plane  outputs  (each  correlation 
using  an  MSF  of  one  class  of  object),  the  M  MSFs  are 
synthesized  so  that  only  one  of  the  M  correlation-plane 
outputs  yield  a  peak  value  near  the  maximum  Our 
algorithm  for  synthesis  of  such  filters  in  Eqs.  (18)  de¬ 
scribes  such  generalized  matched  filter  synthesis  using 
the  original  images  rather  than  the  Fourier  coefficients 
of  each  image.  In  Ref.  4,  only  interclass  discrimination 
was  discussed  (rather  than  intraclass  recognition  of 
distorted  views  of  an  object).  In  Sec.  IV.C,  we  will  ex¬ 
tend  this  MOF  SDF  to  include  both  intraclass  and  in¬ 
terclass  object  recognition. 

In  Ref.  10,  a  Gram-Schmidt  basis  function  selection 
technique  was  used  to  assemble  a  Gram-Schmidt  cor¬ 
relation  matrix.  In  this  approach,  the  first  basis  func¬ 
tion  contains  information  only  associated  with  the  first 
image  f\\  the  second  basis  function  contains  only  the 
new  image  information  present  in  the  second  image  /2 
that  is  not  also  present  in  the  first  image  /j;  the  third 
basis  function  contains  the  new  information  present  in 
/a  but  not  previously  included  in  /j  and  /■>;  etc.  If  the 
first  row  and  column  of  this  Gram-Schmidt  matrix  is  set 
equal  to  zero  (in  Ref.  10,  this  was  achieved  by  multi¬ 
plying  this  matrix  by  a  decorrelation  matrix),  all 
training  set  image  information  associated  with  /j  is  re¬ 
moved,  and  the  correlation  of  the  filter  synthesized  from 
this  reduced  matrix  will  yield  zero  output  when  corre¬ 
lated  with  f\.  Extensions  of  this  technique  to  the  de- 
correlation  of  the  training  set  of  data  for  the  other  object 
classes  follow  directly. 

Our  present  formulation  in  Eqs.  (17)  and  (18)  is  much 
simpler  and  more  easily  implemented,  and  it  is  cast  in 
the  same  general  matrix-vector  form  as  that  of  Eqs.  (11) 
and  (1 21.  We  refer  to  the  SDF  in  Eqs.  (17)  and  ( 18)  as 
an  interclass  MOF  SDF  or  simply  as  an  MOF  SDF. 


C.  MOF  SDF  Synthesis  for  Intraclass  and  Interclass 
Pattern  Recognition 

W  e  now  combine  our  ECP  intraclass  SDF  (Sec.  IV. A) 
and  our  MOF  interclass  SDF  (Sec.  IV. B)  formulations 
to  describe  the  synthesis  of  an  MOF  SDF  for  both  in¬ 
traclass  and  interclass  pattern  recognition.  We  de 
scribe  the  algorithm  for  a  three-class  problem,  for  a 
two-class  problem,  and  then  we  generalize  to  the  case 
of  an  M-class  problem.  This  type  of  SDF  is  appropriate 
for  pattern  recognition  applications  in  which  the  input 
object  can  be  a  member  of  several  classes  and  when 
different  distorted  versions  of  the  input  object  can  be 
expected.  In  such  a  case,  we  must  insure  interclass 
discrimination  between  objects  of  different  classes  and 
intraclass  recognition  of  distorted  versions  of  one  object 
as  members  of  the  same  object  class.  W'e  consider  a 
three-class  pattern  recognition  problem  with  N,  images 
of  one  object  |/ol(x,y)|  of  class  a,  N2  images  of  the  class 
b  object  I/*,  ( x  ,y ) | ,  and  Na  images  of  a  class  c  object 
l/ri (x  ,y  )|  used  as  the  training  set.  As  before,  each  of 
these  training  sets  of  objects  consists  of  different  dis¬ 
torted  versions  of  one  object. 
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We  desire  to  synthesize  three  filter  functions  ha(x,y), 
hh(x,y ),  and  hc(x  ,y)  that  satisfy 

/»U  0>©>imU,v)  =  inm.  (191 

for  all  members  /  of  each  separate  object  class  m  =  n. 
We  describe  the  SDKs  for  our  three-class  problem  as 
linear  combinations  of  the  entire  training  set  of  data 
{/„ ( x .v ) |  =  l/o.U.v),  fh,{x,y),  fc,(x,y)\,  i.e., 

haix.\ )  =  v  a„/„<x.v>.  /i6(x.v)  *  £  6„/„(x,v).  h'(x.y) 

n  n 

■  V<„/„u.M.  (20) 

where  all  summations  in  (20)  are  over  the  entire  N  =  Ni 
+  A/„>  +  /V  ,  training  set  of  images.  By  directly  extend¬ 
ing  our  results  in  Secs.  IV.A  and  IV. B,  we  form  the  full 
(/V,  +  N'<  +  AM  X  (N)  +  N-2+  Ny)  correlation  matrix 
R ,.  Bv  analogy  with  Eq.  (17).  we  can  then  describe  the 
three  SDFs  in  Eqs.  (20)  subject  to  the  conditions  in  Eq. 
(19)  bv  the  vectors  a.  b,  and  c  (each  of  dimensionality 
A,  +  A\  +  AM  as 


aR  ,  =  u0  =  1 1. 

.  1  ;(>.  . 

.  .  0:0. . . 

,01b 

(21a) 

bR,  =  »  |n. 

,0,1. 

.  1 .0.  . 

..or. 

(21b) 

cR ,  =  ii,  =  |0.  . 

. ,  o-.o. 

.  .0.1... 

..nr 

(21c) 

In  Eqs.  (21 ),  ua,  Ub.  and  u<  each  contain  A/j  +  A/2  +  A/;j 
elements  with  only  the  first  N lt  the  central  Nj  or  the  last 
N:)  elements  being  1.  respectively,  and  with  all  other 
elements  being  zero.  The  matrix-vector  constraints  in 
Eqs.  (21 )  thus  correspond  to  those  in  Eq.  (19),  i.e.,  a,  and 
hence  ha(x,y)  is  required  to  have  a  unity  correlation 
output  for  the  A,  objects  in  class  a  and  zero  for  the 
other  training  set  images.  Similar  remarks  follow  for 
b  and  c  and  equivalently  for  the  associated  filters 
h6( x,y)  and  hc{x,y) 

The  three  SDFs  in  Eqs.  (20)  are  thus  defined  by 

a  -  Rj'ii.,  b  =  R-;W.  c  =  R.T'u,  (22) 

analogously  to  Eqs.  (18).  They  thus  satisfy  a  three-class 
intraclass  and  interclass  pattern  recognition  problem. 
The  extension  to  M  -classes  with  N  training  set  images 
per  class  results  in  an  increase  in  the  size  of  the  corre¬ 
lation  matrix  (to  MN  x  MN)  and  an  increase  in  the 
dimensionality  of  the  coefficient  vectors  to  MN.  We 
refer  to  such  filters  as  intraclass  and  mterclass  MOF 
SDFs  or  simply  as  MOF  SDFs. 

For  smaller  problems  such  as  two-class  pattern  rec¬ 
ognition  applications  requiring  intraclass  distortion- 
invariance,  considerable  simplifications  are  possible.5 
W  e  can  use  a  single  SDF  h(x,y)  described  by 

a  =  R  i'u*.  (23) 

where  ua  =  ( 1 . 1  ;0, . .  .  ,  0) r  contains  N  ones  and 

N  zeros  and  where  R«  is  the  two-class  correlation  matrix 
of  dimensionality  Ai  +  AM  If  the  correlation  peak 
output  is  above  (or  below)  an  0.5  threshold  level  (half¬ 
way  between  the  two  required  zero  and  one  output 
levels),  we  choose  class  one  (or  class  two)  for  the  object 
class.  Since  a  zero  correlation  output  or  a  correlation 
output  below  0.5  can  also  result  when  no  input  object 


is  present,  this  approach  in  Eq.  (23)  is  appropriate  only 
in  the  restricted  applications.  We  can  modify  Eq.  (23) 

using  u„  =  [1 . 1;— 1 . -l|r.  In  this  case,  if  the 

correlation  peak  value  is  above  0.5  (or  below  —0.5),  we 
select  class  one  (or  class  two)  as  the  object  class.  Such 
an  approach  is  attractive  for  digital  correlators  but  not 
for  conventional  optical  correlators  using  intensity  de¬ 
tectors.  (Such  correlators  provide  unipolar  correlation 
outputs  only.)  For  now.  we  only  note  that  for  two-class 
intraclass  pattern  recognition  problems,  a  single  SDF 
in  general  suffices  if  two  different  correlation  plane 
detect  ion  threshold  levels  are  used  with  the  class  of  the 
input  object  determined  from  the  correlation  peak 
value.  Such  multilevel  nonredundant  filter  (NRF) 
SDFs  are  discussed  further  in  Sec.  IV. D. 

D.  Multilevel  NRF  SDFs 

We  now  generalize  our  three-class  intraclass  and  in¬ 
terclass  pattern  rcognition  example  in  Sec.  IV. C  to  the 
use  of  a  single  SDF.  This  SDF  h(x,y)  is  required  to  give 
outputs  of  1, 2,  and  3  for  -bjects  in  classes  one,  two.  and 
three,  respectively.’’  (Other  appropriate  constants  can 
be  selected.)  We  refer  to  such  a  filter  as  a  multilevel 
nonredundant  filter  (NRF)  SDF.  The  filter  require¬ 
ment  is  described  by 

f„<x,\)d>h(x,\)  =  n,  (24) 

where  the  correlation  output  n  =  1  for  objects  |/iU,y)| 
in  class  one,  n  =  2  for  objects  |/2(x  ,y)l  in  class  two,  etc. 
With  N],  A2,  and  A3  training  set  images  for  the  three 
classes,  respectively,  h(x,y)  is  described  by 

/I < X ,.v )  =  JT  amfm(x.y),  (251 

m 

where  fm{x,y)  =  \fu(x,y),f2i(x,y),h,(x,y)\  contains  all 
N\  +  N2  +  A/3  training  set  images  and  where  the  sum¬ 
mation  in  Eq.  (25)  is  over  A/)  +  A2  +  A/3.  The  vector 
a  that  describes  the  h{x,y)  that  satisfies  Eq.  (24)  is 

a  =  RT'(i;i,  (26 1 

where  R3  is  the  (N\  +  A2  +  AM  X  (A/)  +  No  +  AM  cor¬ 
relation  matrix  and  where  U3  =  |1 . 1;2,  .  .  .  ,  2; 

3, . . . ,  3]  r  contains  N i  ones,  No,  twos,  and  N 3  threes. 
Extensions  of  this  multilevel  NRF  SDF  to  more  classes 
are  straightforward.  However,  more  stringent  detector 
plane  requirements  and  reduced  performance  can  be 
expected  as  the  number  of  restrictions  placed  on  such 
a  single  SDF  filter  are  increased.  Thus,  if  this  tech 
nique  is  to  be  used  for  more  than  three  or  four  classes 
of  data,  more  advanced  preprocessing  and  image 
training  set  selection  techniques  should  be  considered.17 
Such  issues  will  be  addressed  in  subsequent  journal 
papers. 

E.  K-tuple  NRF  SDFs 

As  implied  in  our  discussion  in  Sec.  IV. D,  a  single 
SDF  for  multiclass  intraclass  and  interclass  recognition 
is  possible  conceptually  but  may  not  yield  acceptable 
performance  when  noise  and  other  issues  are  included 
The  specific  nature  of  the  distortions  to  be  considered 
and  the  nature  of  the  different  object  classes  to  be  dis- 


15  May  1984  Vol  23.  No  10  /  APPLIED  OPTICS 


1625 


tinguished  plus  the  type  and  amount  of  noise  to  be  ex¬ 
pected  will  determine  the  performance  obtained.  If  a 
large  number  of  object  classes  must  be  considered,  the 
conventional  MOF  SDFs  (Secs.  IV. B  and  IV. C)  would 
require  the  use  of  M  SDFs  (for  an  A/ -class  problem)  and 
the  scanning  of  M  2-D  output  correlation  planes.  For 
such  pattern  recognition  problems,  an  alternate  tech¬ 
nique  we  refer  to  as  a  K -tuple  NRF  SDF  technique  may 
be  more  appropriate. 

In  this  SDF  algorithm,  we  consider  an  AZ-class  pat¬ 
tern  recognition  problem  and  the  use  of  only  K  SDFs, 
where  K  is  chosen  to  satisfy  M  <  2K.  In  this  case,  we 
denote  the  K  correlation  outputs  at  the  same  (x,y) 
spatial  location  in  ali  K  correlation  planes  by 
Ci,C2, . . .  ,  ck  We  consider  the  case  of  binary  correla¬ 
tion  plane  threshold  levels  of  0  or  1  as  in  Secs.  IV.B  and 
IV. C.  For  each  correlation  plane  point,  we  thus  have 
a  /(-tuple  binary  vector®.  Using  conventional  binary 
Boolean  algorithm  encoding,  each  c  vector  can  thus 
represent  up  to  2K  different  numbers.  In  our  present 
Af-class  pattern  recognition  problem,  this  means  that 
the  corresponding  c  value  can  determine  to  which  of  M 
=  2K  classes  the  input  object  lies  (the  object  at  the  as¬ 
sociated  position  in  the  input  plane).  We  refer  to  this 
as  the  /(-tuple  binary-level  NRF  SDF  or  simply  as  a 
/(-tuple  NRF  SDF. 

The  formal  description  of  such  SDFs  is  best  pre¬ 
sented  only  for  a  K  =  2  filter  SDF  case,  i.e.,  a  four-class 
pattern  recognition  problem.  Generalizations  beyond 
this  case  should  follow  directly,  but  formally  writing  the 
associated  matrix-vector  equations  results  in  unneeded 
notational  complexity  that  will  not  further  advance 
understanding  of  the  basic  concepts.  We  thus  consider 
only  an  M  =  four-class  intraclass  pattern  recognition 
problem  and  the  use  of  K  =  2  SDFs.  We  describe  the 
training  sets  of  data  for  these  four  object  classes  by 
l/ii(x,y)|,  l/2<(*.y)l.  etc.  We  assume  that  there  are  N i 
training  set  images  in  class  one,  N2  images  in  class  two, 
etc.  For  notational  simplicity,  we  assume  N  =  N\  =  N2 
*  N 3  =  N*  or  4 N  total  training  set  images.  We  describe 
the  full  4  N  training  set  of  images  by  ]/n  (*,>')!,  the  asso¬ 
ciated  correlation  matrix  by  R  (it  is  of  dimensionality 
4 N  X  4A/),  and  the  two  SDFs  by  ha(xj)  and  hb(x,y). 

We  require  the  two  correlation  outputs  of  the  general 
input  image  f(x,y )  and  the  two  filters  h0(x,y)  and 
hb(x,y)  to  satisfy  the  truth  table  in  Table  I.  TTielour 
possible  combinations  of  the  binary  (0  and  1 )  correlation 
plane  outputs  are  used  to  denote  in  which  of  the  four 
classes  the  input  object  lies.  We  use  correlation  plane 
levelsofOand  1  with  no  loss  of  generality.  A U hough  a 
zero  output  level  can  correspond  to  no  object,  this  is 
easily  altered  by  selecting  any  other  nonzero  coefficient 
for  the  desired  correlation  plane  output  level  in  our  filter 
•ynthesis  equation  (as  noted  at  the  end  of  Sec.  IV. D). 
We  denote  the  two  SDFs  by 

h.(xo')m'La, J.lij),  huix.y)  *  £  b„f„{ i,yl,  (27) 

n  n 

where  the  summations  in  Eqs.  (27)  are  over  the  4 N 
training  set  images.  In  matrix-vector  form,  we  write 
the  truth  table  in  Table  I  as 


Tabto  I.  Truth  Tabto  for  a  tflupto  Norwodundanl  SDF.  Tho  C»m  of 
Jbf  =  4  CltiMi  and  K  *  2  FUtors  to  Shown. 


or 


R[ab|  =  (u  i  2] .  (28b) 

where  R  is  4N  X  4 N,  a  and  b  define  hn  and  ht  in  Eqs. 
(27)  and  where  the  right-hand  side  vector  in  Eqs.  (28) 
consists  of  two  column  vectors  u  i  and  ii2  with  N  element 
pairs  equal  to  (0,0),  N  element  pairs  equal  to  (0,1 ),  etc. 
The  solution  to  Eqs.  (28)  for  the  vectors  a  and  b  that 
define  ha(x,y )  and  hb(x,y)  in  Eqs.  (27)  to  satisfy  Table 
I  or  Eq.  (28)  is  thus 

fab]  =  R-‘|ii,U2).  (29) 

A  variation  of  this  /(-tuple  NRF  formulation  was  first 
advanced  in  Ref.  18  for  coherent  optical  systems  and 
then  extended  to  noncoherent  optical  correlators  in  Ref. 
19.  Neither  of  these  formulations  used  a  correlation 
matrix  observation  space  to  describe  synthesis  of  the 
required  filter,  however. 

V.  Summary  and  Conclusion 

From  Sec.  IV,  we  have  shown  and  detailed  how  five 
different  types  of  synthetic  discriminant  function  for 
different  pattern  recognition  problems  (intraclass  rec¬ 
ognition,  interclass  discrimination,  and  both  intraclass 
and  interclass  object  identification)  can  be  formulated 
as  the  same  general  matrix-vector  equation.  Specifi¬ 
cally,  the  vectors  that  describe  the  SDFs  equal  the  in¬ 
verse  of  a  correlation  matrix  R  times  a  control  vector 
(containing  ones,  zeros,  or  other  similar  constants). 
Inspection  of  Eqs.  (11)  and  (12),  (17)  and  (18),  (21)  and 
1*2*2),  (23),  (26),  and  (28)  and  (29)  shows  that  all  ex¬ 
pressions  for  such  a  filter  computation  and  synthesis  are 
of  the  same  general  matrix-vector  linear  algebraic 
equation  form. 

In  Sec.  Ill,  we  provided  a  general  description  of  a 
distortion-invariant  matched  spatial  filter  and  showed 
for  general  basis  functions  that  all  such  MSF  pattern 
recognition  problems  involve  inversion  of  a  correlation 
matrix  and  multiplication  by  a  different  external  vector. 
The  size  of  and  the  elements  of  the  correlation  matrix 
and  the  elements  chosen  for  the  external  vector  differ 
for  specific  pattern  recognition  applications,  but  the 
same  general  format  is  retained  throughout  all  types  of 
SDF  for  different  applications.  In  Sec.  II,  we  described 
the  philosophy  and  details  of  SDF  synthesis  in  terms  of 
a  modified  hyperspace  and  feature-vector  system. 
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This  hyperspace  description  (using  2-D  basis  func¬ 
tions  and  2-D  discriminant  functions),  our  derivation 
of  the  use  of  a  correlation  matrix  observation  space 
(independent  of  the  basis  functions  used),  and  the 
general  unification  and  detailed  description  of  five 
different  types  of  SDF  are  the  original  contributions  in 
this  paper.  Other  variations  of  these  concepts  (such  as 
combinations  of  multilevel  K -tuple  nonredundant  filter 
SDFs)  are  obvious  but  were  not  detailed.  Initial  ex¬ 
periments20  have  showed  that  the  SDFs  described 
herein  give  most  excellent  performance  even  in  the 
performance  of  noise.17  These  results  and  the  issue  of 
training  set  selection  and  the  theoretical  basis  for  the 
performance  found  using  such  filters  will  be  the  subject 
of  future  journal  papers.  Our  intent  in  this  paper  was 
to  provide  the  initial  details  and  foundations  of  a  unified 
SDF  filter  synthesis  technique  for  multiclass  distor¬ 
tion-invariant  shift-invariant  pattern  recognition. 


The  support  of  this  research  by  grants  from  the  Air 
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9.  ACOUSTO-OPTIC  LINEAR  ALGEBRA 
PROCESSORS:  ARCHITECTURES, 
ALGORITHMS  AND  APPLICATIONS 


Acoustooptic  Linear  Algebra  Processors: 
Architectures,  Algorithms,  and 
Applications 

DAVID  CASASENT,  FELLOW,  IEEE 
Invited  Paper 


Architectures,  algorithms,  and  applications  for  systolic  processors 
aie  described  with  attention  to  the  realization  of  parallel  algorithms 
on  various  optical  systolic  array  processors  Systolic  processors  for 
matrices  with  special  structure  and  matrices  of  general  structure, 
and  the  realization  of  matrix- vector,  matrix- matrix,  and  triple-ma¬ 
ins  products  and  such  architectures  are  described.  Parallel  algo¬ 
rithms  for  direct  and  indirect  solutions  to  systems  of  linear  algebraic 
equations  and  their  implementation  on  optical  systolic  processors 
are  detailed  with  attention  to  the  pipelining  and  flow  of  data  and 
operations  Parallel  algorithms  and  their  optical  realization  for  LU 
and  QR  matrix  decomposition  are  specifically  detailed  These  repre¬ 
sent  the  fundamental  operations  necessary  in  the  implementation 
of  least  squares,  eigenvalue,  and  SVD  solutions.  Specific  applica¬ 
tions  (e  g .  the  solution  of  partial  differential  equations,  adaptive 
noise  cancellation,  and  optimal  control)  are  described  to  typi  fy  the 
use  of  matrix  processors  in  modern  advanced  signal  processing. 


I  Introduction 

Optical  processors  have  long  intrigued  researchers  and 
data  processors  because  of  their  parallelism,  high  computa¬ 
tional  rates,  small  size  and  weight,  and  their  low  power 
dissipation  and  cost  Most  optical  processors  have  been 
special-purpose  systems  performing  Fourier  transforms  and 
correlations.  However,  in  the  past  three  years,  more  gen¬ 
eral-purpose  optical  processors  have  emerged  that  perform 
matrix-vector  and  various  linear  algebra  operations  These 
optical  linear  algebra  processor  architectures  exhibit  pipe¬ 
lining  and  both  local  and  global  interconnections  and  are 
generally  referred  to  as  optical  systolic  array  processors  In 
this  paper,  several  architectures  and  various  algorithms  for 
the  use  of  such  systems  in  various  applications  are  re¬ 
viewed.  Because  of  the  parallel  nature  of  these  optical 
systolic  array  processors,  parallel  linear  algebra  algorithms 
are  essential  and  the  flow  of  data  and  operations  in  the 
system  as  well  as  input  and  output  issues  merit  attention 

Many  systolic  [1],  wavefront  [2],  and  concurrent  [3J  paral¬ 
lel  digital  architectures  have  been  suggested  in  which  most 
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processing  elements  are  standard  and  in  which  each 
processing  element  is  always  kept  active  as  data  flow  across 
the  element  array.  Conventional  algorithms  (e  g.,  the  Inter¬ 
national  Mathematical  and  Statistics  Library,  IMSL)  are 
appropriate  for  uniprocessors  but  not  for  systolic  array  ar¬ 
chitectures.  A  wealth  of  research  on  algorithms  for  multi¬ 
processors  and  parallel  algorithms  suitable  for  systolic 
processors  exist  [4J.  However,  systolic  architectures  are  often 
devised  to  implement  different  algorithms,  and  the  algo¬ 
rithm  and  system  design  for  complex  operations  is  often 
complicated  by  the  requirement  to  utilize  only  local  com¬ 
munication  and  yet  maintain  efficiency  in  the  systolic  array. 
Thus  even  within  the  digital  systolic  array  community,  ap¬ 
propriate  algorithms  for  systolic  architectures  is  a  current 
area  of  intensive  research.  No  attempt  will  be  made  to 
review  digital  systolic  architectures  and  algorithms.  Rather, 
attention  will  be  focused  only  on  optical  systolic  processors 
and  parallel  algorithms  thus  far  developed  for  such  systems 
Since  research  on  appropriate  algorithms  and  architectures 
for  optical  systolic  processors  is  still  in  the  formulation 
stage,  and  since  different  algorithms  and  implementations 
detailed  are  appropriate  for  each  different  optical  systolic 
architecture  proposed,  I  shall  concentrate  on  the  basic 
linear  algebra  operations  of  matrix-vector,  matrix-matrix, 
and  triple-matrix  multiplications,  plus  matrix  inversion,  di¬ 
rect  and  indirect  solutions  of  systems  of  linear  algebraic 
equations  (LAEs),  and  matrix  decomposition.  These  repre¬ 
sent  the  basic  linear  algebra  operations  required  for  more 
advanced  problems  such  as  least  squares,  eigenvalue,  and 
singular  value  decomposition  (SVD)  algorithms  needed  in 
advanced  modern  signal  processing. 

Optical  systolic  processors  represent  an  attractive  gen¬ 
eral-purpose  system  for  performing  various  matrix-vector 
and  linear  algebra  operations  with  the  high  speed  and 
parallelism  of  the  optics  being  fully  utilized.  Since  many 
image,  pattern  recognition,  and  signal  processing  problems 
can  and  generally  are  formulated  as  matrix-vector  prob¬ 
lems,  such  optical  processors  represent  a  general-purpose 
and  flexible  system  in  which  one  optical  processor  can 
solve  a  wide  variety  of  problems  in  many  different  applica¬ 
tions  By  examples  and  specific  case  studies,  these  features 
will  be  shown. 

Many  different  optical  matrix-vector  processors  have 
been  described  as  far  back  as  1965.  A  survey  of  these 
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architectures  is  available  (5J.  Several  of  the  current  systolic 
architectures  advanced  in  the  past  two  years  are  reviewed 
in  Section  II.  These  are  separated  into  architectures  for 
matrices  with  special  structure  and  those  for  general 
matrices.  How  these  various  architectures  achieve  the  baste 
operations  of  matrix-vector  and  matrix-matrix  multiplica¬ 
tion  is  detailed  The  solution  of  systems  of  linear  algebraic 
equations  is  a  central  problem  in  engineering  and  computa¬ 
tional  mathematics.  Thus  the  basic  indirect  (Section  III)  and 
direct  (Section  IV)  parallel  algorithms  for  this  fundamental 
operation  are  reviewed  and  optical  implementations  for 
each  are  detailed,  with  attention  to  pipelining  and  flow  of 
data  and  operations  in  the  system.  The  extension  of  these 
basic  operations  to  advanced  problems  such  as  least  square, 
eigenvector,  and  SVD  solutions  are  then  briefly  reviewed 
(Section  V)  Three  specific  applications  are  then  briefly 
discussed  to  detail  how  the  basic  operations  (the  vector 
inner  product,  matrix-vector  multiplication,  and  the  solu¬ 
tion  of  systems  of  LAEs)  and  matrices  with  special  structure 
arise.  The  applications  chosen  include:  the  solution  of  par¬ 
tial  differential  equations  (Section  Vl-A),  adaptive  noise 
filtering  (Section  Vl-B),  and  optimal  control  requiring  the 
solution  of  quadratic  matrix  equations  (Section  Vl-C).  Accu¬ 
racy  and  performance  issues  are  then  addressed  together 
with  a  summary  and  conclusion  in  Section  VII. 

II.  Optical  Linear  Algebra  Processor  Architectures 

A  plethora  of  optical  matrix-vector  and  systolic  architec¬ 
tures  have  been  described  in  the  past  several  years.  These 
include:  the  original  Naval  Ocean  Systems  Center  [6),  Stan¬ 
ford  [7],  and  Carnegie-Mellori  University  [8]  systems,  beam 
modulator  systems  using  change-coupled  device  (CCD)  shift 
register  detector  readout  19),  beam  modulator  systems 
without  CCD  shift  register  readout  [10],  banded  and  Toep- 
htz  matrix  acoustooptic  (AO)  systems  [10],  iterative  AO 
systolic  architectures  [10],  vector  outer  product  systems  using 
time-integrating  detectors  and  crossed  AO  cells  or  two-di¬ 
mensional  (2-D)  spatial  light  modulators  (SIMs)  [11],  an 
engagement-mode  processor  using  multichannel  AO  cells 
[12],  frequency-multiplexed  AO  processors  [13],  an  engage¬ 
ment-mode  (RUBIC)  cube  processor  using  2-D  SIMs  [1 4], 
and  architectures  combining  one-dimensional  (1-D)  and 
2-D  SLMs  [15].  Many  optical  systolic  architectures  for  im¬ 
proved  accuracy  and  performance  have  also  been  de¬ 
scribed  These  include:  accurate  vector  outer  product 
processors  [16],  an  accurate  RUBIC  cube  processor  [17], 
architectures  using  1-D  and  binary  2-D  SLMs  [18],  and  an 
accurate  engagement-mode  processor  using  multichannel 
AO  cells  (the  systolic  AO  binary  convolver,  SAOBIC)  [19] 
Several  of  these  architectures  are  reviewed  elsewhere  tn 
this  issue  [20],  [21], 

In  this  present  paper,  only  analog  optical  systolic 
processors  using  single-channel  AO  cells  are  considered 
(such  systems  are  readily  available  with  present  component 
technology)  All  of  the  systems  and  algorithms  described  in 
this  paper  can  be  extended  fairly  directly  to  use  multichan¬ 
nel  AO  cells  and  binary  2-D  SLMs.  Such  extensions  increase 
the  number  of  operations  performed  per  second  It  appears 
best  to  use  the  added  dimension  of  such  systems  to  achieve 
improved  system  accuracy  (as  accomplished,  for  example, 
in  the  SAOBIC  architecture  [19])  rather  than  increased  com¬ 
putational  rates  [22],  The  method  by  which  such  advanced 


optical  systolic  processors  achieve  digital  accuracy  uses  data 
encoding  and  the  basic  algorithm  for  digital  multiplication 
by  convolution  first  described  in  [23]  (and  first  applied  to 
optical  architectures  in  [24]).  Examples  of  several  such  archi¬ 
tectures  are  described  elsewhere  in  this  issue  [20],  [21] 

Initial  laboratory  demonstrations  have  been  provided  for 
several  of  the  architectures  and  algorithms  described  [6]-[8], 
[18],  [22],  [25]  Many  of  the  optical  systems  noted  above 
produce  2-D  output  data  in  parallel  Use  of  such  syste  ms  in 
most  applications  requires  advanced  detector  readout 
methods  with  parallel  A/Ds,  detectors,  and  parallel  high¬ 
speed  post -processing  logic  In  incorporating  such  2-D  out¬ 
put  systems  into  the  algorithms  described,  parallel  readout 
of  one  row  or  column  of  the  2-D  output  is  assumed  In  the 
block  diagrams  that  will  be  used  to  describe  the  various 
algorithms  and  architectures,  a  generic  optical  systolic 
processor  that  performs  one  matrix-vector  multiplication 
every  bit  time  Te  is  assumed  (with  a  parallel  linear  input 
and  a  parallel  linear  output  array)  With  associated  modifi¬ 
cations,  most  of  the  algorithms  described  can  be  imple¬ 
mented  on  the  various  optical  systolic  array  processors 
(with  associated  modifications  to  the  data  flow  and  compu¬ 
tational  time,  depending  upon  the  specific  processor  used) 

A.  Systems  for  Matrices  with  Special  Structure 

As  the  first  class  of  AO  systolic  processors,  we  consider 
systems  suitable  for  matrices  with  special  structure  The 
system  of  Fig  1  consists  of  N  point  modulators  whoso 


Fig.  1.  Simplified  schematic  of  a  banded-mam*  optic  al  svs- 
tolic  processor  using  CCD  shifl  register  (SR)  detector  readout 
(adapted  from  [9]) 


outputs  are  imaged  through  different  regions  of  an  AO  veil 
and  onto  different  output  detectors.  AO  cells  and  conven¬ 
tional  AO  architectures  are  detailed  elsewhere  [25],  but  a 
simplistic  description  is  included  herein  for  completeness 
Electrical  data  fed  to  an  AO  cell  are  converted  to  an 
acoustic  wave  which  travels  the  length  of  the  cell  and 
introduces  spatial  and  temporal  variations  in  the  dielectric 
constant  of  the  acoustic  material.  When  the  data  reach  the 
end  of  the  cell,  they  are  absorbed  When  the  cell  is  il¬ 
luminated  with  light,  the  amplitude  or  intensity  of  the  light 
leaving  the  cell  is  modulated  spatially  in  proportion  to  the 
strength  of  the  acoustic  field  in  the  cell  (i.e  ,  proportional  to 
the  strength  of  the  input  electrical  signal),  and  the  light 
leaves  the  cell  at  an  angle  proportional  to  the  spatial 
frequency  of  the  acoustic  signal  (i.e  ,  proportional  to  the 
frequency  of  the  input  electrical  signal)  These  two  proper¬ 
ties  of  AO  cells  will  be  employed  in  the  various  architec- 
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fig.  2.  Simplified  schematic  of  a  banded-malnx  and  Toeplitz-matrix  optical  systolic 
processor  [10]  with  only  a  single  detector  This  architecture  exhibits  local  and  global 
interconnections  and  performs  a  vector  inner  product  as  its  basic  operation 


tures  described  For  simplicity,  we  will  omit  the  details  of 
the  single-sideband  filtering  required  in  such  architectures. 

With  respect  to  Fig  1,  the  elements  of  the  vector  are  fed 
time-sequentially  to  the  AO  cell.  Each  vector  element  is 
assigned  a  given  time  slot  in  the  input  electrical  signal  and 
the  electrical  power  in  each  time  slot  is  proportional  to  the 
desired  vector  element  value  We  denote  the  length  of  the 
AO  cell  in  time  by  TA  and  the  time  duration  of  each  data 
packet  or  vector  element  by  Te  For  simplicity,  we  assume 
N  "  T„/Tg  data  packets  or  pulses  can  be  present  in  the  cell 
tm  practice,  some  time  spacing,  i.e.,  a  guard  band,  will  be 
required  between  data  pulses).  The  parameters  N  and  Te 
are  set  by  the  time-bandwidth  product  (TBWP)  of  the  AO 
cell,  TBWP  -  TAWA  -  1000  being  a  typical  value,  where  WA 
is  the  bandwidth  of  the  cell  We  consider  the  use  of  the 
system  of  Fig  1  to  form  the  matrix-vector  product  Ab 
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where  the  matrix  A  is  banded  We  denote  matrices  and 
vectors  by  bold  face  upper  and  lower  case  letters,  respec¬ 
tively.  The  bandwidth  of  the  matrix  is  the  number  of  non¬ 
zero  diagonals  (three  in  (1)) 

The  matrix-vector  product  in  (1)  can  be  accomplished 
optically  on  the  system  of  Fig  1  by  feeding  the  vector  b  to 
the  AO  cell  time-sequentially  as  shown  and  the  three 
diagonals  of  the  matrix  as  time-histories  to  the  three  input 
point  modulators  The  data  modulation  on  the  electronic 
time-history  input  signals  for  this  case  is  shown  in  Fig  1 
(time  increases  from  right  to  left  in  the  figure)  New  data 
enter  the  system  every  Tg  with  zero-valued  data  packets  of 
duration  Tg  placed  between  each  vector  and  matrix  ele¬ 
ment  When  the  input  point  modulators  are  pulsed  on,  the 
light  intensity  leaving  the  AO  cell  is  the  point-by-point 
product  of  the  input  data  to  the  point  modulators  and  the 
associated  RF  input  to  the  AO  cell  These  point-by-pomt 
products  are  collected  on  separate  output  detectors  The 
contents  of  the  output  CCD  detector  array  are  then  shifted 
down  by  one  and  at  the  next  TB,  the  new  pomt-by-pomt 
products  are  added  (by  charge  accumulation)  to  the  shifted 
data  previously  present  on  the  detectors  It  is  easily  shown 
that  the  time-history  output  from  the  single  channel  on  the 


linear  detector  CCD  shift-register  (SR)  array  is  seen  to  be 
the  desired  matrix-vector  product  Ab  -  d  in  (1).  This  first 
AO  systolic  processor  architecture  was  described  by  Caul¬ 
field  et  al  [9]  earlier.  At  each  point  in  the  AO  cell  (opposite 
an  input  point  modulator),  the  system  performs  a  multipli¬ 
cation,  and  on  the  associated  output  detector  this  scalar 
product  is  added  to  a  prior  value  obtained  from  the  neigh¬ 
boring  focal  element.  In  this  sense,  this  architecture  is  the 
optical  equivalent  of  a  conventional  digital  systolic  architec¬ 
ture  [1). 

Another  AO  systolic  architecture  for  multiplication  of  a 
banded  matrix  by  a  vector,  described  by  Casasent  [10],  [13], 
is  shown  in  Fig  2.  In  this  system,  the  nonzero  elements  of 
each  row  of  the  matrix  are  fed  one  row  at  a  time  in  parallel 
to  the  input  point  modulators  and  the  vector  data  are  fed 
time-sequentially  to  the  AO  cell.  Each  TB.  the  elements  of 
the  input  matrix  row  are  multiplied  by  the  corresponding 
elements  of  the  vector  b,  and  the  sum  of  these  scalar 
products  is  produced  on  a  single  output  detector.  The 
integrating  lens  shown  achieves  this  summation  of  partial 
products.  Thus  each  Te,  this  architecture  multiplies  one  row 
of  the  matrix  by  the  associated  elements  of  the  vector  b 
and  thus  one  element  of  the  Ab  «  d  vector  is  produced 
each  TB  Other  techniques  to  perform  banded  matrix-vec¬ 
tor  multiplications  exist  that  avoid  the  need  for  the  CCD  SR 
detector  readout  required  in  the  beam-modulator  architec¬ 
ture  of  Fig.  1.  The  system  of  Fig  2  is  the  simplest  since  it 
requires  only  one  output  detector.  Such  architectures  are 
quite  attractive  for  banded  matrices  since  the  number  of 
input  point  modulators  required  only  need  equal  the  band¬ 
width  of  the  matrix,  and  only  one  output  detector  is  neces¬ 
sary.  Such  processors  exhibit  the  local  interconnection  fea¬ 
ture  of  digital  systolic  processors  together  with  a  global 
interconnection  feature  unique  to  optical  systems  (i.e.,  ad¬ 
dition  of  all  separate  element  products  by  use  of  an  in¬ 
tegrating  lens).  The  system  of  Fig  2  thus  performs  one 
vector  inner  product  each  Te  of  time.  In  designing  optical 
systolic  array  architectures,  the  unique  global  interconnec¬ 
tion  features,  which  have  served  optical  processing  systems 
very  well  in  the  past,  should  not  be  abandoned,  and  thus 
optical  systolic  architectures  should  not  attempt  to  emulate 
the  various  digital  systolic  architectures  that  are  proposed 
The  subsequent  optical  systolic  architectures  to  be  de¬ 
scribed  make  use  of  this  philosophy. 

Let  us  next  consider  an  optical  systolic  processor  to 
multiply  a  vector  by  a  Toeplitz  matrix  (the  elements  along  a 
diagonal  are  constant  in  a  Toeplitz  matrix,  i.e ,  the  elements 
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Fig.  3.  Simplified  schematic  of  a  Toeplitz-matrix  optical  systolic  processor  |10]  for  matrices 
with  large  bandwidths 


of  each  row  of  the  matrix  are  the  same  shifted  by  one 
position).  In  this  case,  the  architecture  of  Fig  2  again 
suffices  [10]  Now,  the  nonzero  elements  of  one  row  of  the 
matrix  are  fixed  inputs  (constant  with  time)  to  the  input 
point  modulators,  and  the  vector  data  are  fed  to  the  AO 
cell.  Each  Te,  one  row  of  the  matrix  is  multiplied  by  the 
corresponding  vector  elements  (the  AO  cell  achieves  the 
time-delay  shift  required  to  align  the  matrix  elements  with 
the  proper  vector  elements  automatically)  and  summed  (by 
the  integrating  lens)  to  produce  one  element  (a  vector  inner 
product)  of  the  final  matrix-vector  product  on  a  single 
output  detector  every  TB.  The  number  of  input  point  modu¬ 
lators  required  and  the  TBWP  needed  for  the  AO  cell  are 
determined  by  the  size  of  the  matrix. 

An  alternate  architecture  devised  by  Casasent  [10]  for  the 
multiplication  of  a  vector  b  by  a  Toeplitz  matrix  A  is  shown 
in  Fig.  3  In  this  system,  the  elements  of  b  are  fed  time- 
sequentially  to  one  input  point  light  modulator  whose 
output  uniformly  illuminates  an  AO  cell  fed  with  the  data  a 
in  one  column  of  the  Toeplitz  matrix  A.  The  light  distribu¬ 
tion  leaving  the  AO  cell  at  each  TB  is  a  scalar-vector 
product,  i.e.,  b^n  (where  the  input  to  the  point  modulator 
bn  is  the  associated  element  of  the  vector  b,  and  an  con¬ 
tains  all  of  the  elements  of  column  n  of  the  matrix  A, 
properly  apertured).  This  bjin  product  is  imaged  onto  a 
linear  output  detector  array.  At  the  next  Tg,bn^,an+1  is 
formed  and  added  to  the  previous  scalar-vector  product. 
Thus  after  NTB  (the  integration  time  of  the  detector)  the 
entire  matrix-vector  product  Ab  «*  d  is  present  on  the 
output  detectors.  This  Toeplitz  matrix-vector  product  is 
thus  achieved  as 

At>-  [*1  •••  *n][bi  •••  bnV  -  Ml  +  M;  +  •••  “  **b 

(2) 

where  an  is  the  nth  column  of  the  matrix  A  and  bn  is 
element  n  of  the  vector  b  Since  all  columns  of  A  are 
shifted  versions  of  each  other,  the  matrix-vector  product  is 
simply  the  convolution  of  the  elements  of  b  and  the 
elements  of  one  column  a  of  A  We  denote  this  by  a*  b  in 
(2)  (where  •  denotes  convolution)  This  formulation  of  a 
matrix-vector  product  as  a  convolution  is  also  employed  in 
many  high-accuracy  digital  optical  systolic  processors 

Persons  familiar  with  AO  signal  processors  will  recognize 
the  system  of  Fig.  2  as  a  space  integrating  AO  correlator  and 
the  system  of  Fig  3  as  a  time  integrating  acoustooptic 
correlator  These  architectures  have  existed  and  have  been 
used  for  correlation  signal  processing  for  many  years  [25], 
Thus  in  retrospect,  optical  signal  processors  have  used  sys¬ 
tolic  architectures  for  many  years  (but  under  different 


names).  The  architecture  of  Fig  3  is  preferable  for  Toeplitz 
matrix  applications  when  the  bandwidth  of  the  matrix  is 
large,  and  the  architecture  of  Fig  2  is  preferable  when  the 
length  of  the  vector  is  large.  Since  convolution  is  commuta¬ 
tive,  the  roles  of  the  matrix  and  vector  can  be  reversed  in 
either  architecture,  as  the  application  or  system  fabrication 
merits. 

B  Systems  for  General  Matrices 

The  architecture  of  Fig.  1  (or  an  associated  architecture 
with  parallel  readout  detectors)  can  be  extended  to  handle 
general  matrices  at  a  significant  increase  in  computational 
difficulty.  The  architectures  of  Figs.  2  and  3  are  most  ap¬ 
propriate  for  matrices  with  special  structure  (banded  or 
Toeplitz).  Special  techniques  for  circulant  matrices  (as  arise 
in  FFTs)  and  other  matrix  structures  are  also  possible  and 
follow  directly  from  conventional  linear  algebra.  In  this 
subsection,  we  discuss  optical  systolic  processors  suitable 
for  the  multiplication  of  a  general  matrix  by  a  vector  (t.e., 
when  the  matrix  has  no  specific  structure).  The  two  major 
architectures  considered  are  an  AO  modulator  and  an  AO 
modulator-deflector.  The  modulator  system  is  analogous  to 
that  of  Fig  1  but  with  separate  detectors  with  parallel 
readout  (and  a  rearrangement  of  the  method  for  feeding 
data  to  the  system).  This  system  is  described  elsewhere  [20] 
and  is  thus  not  detailed  here.  Rather,  the  frequency-multi¬ 
plexed  modulator-deflector  architecture  of  Fig  4  is  de¬ 
scribed  In  this  latter  system  [13],  A4  input  point  modulators 
are  imaged  through  M  spatially  separated  regions  of  an  AO 
cell,  and  the  Fourier  transform  of  the  light  distribution 
leaving  the  AO  cell  is  formed  in  the  back  focal  plane  of  the 
lens  where  it  is  sensed  by  a  linear  output  detector  array 
with  parallel  outputs.  This  system  is  thus  topologically  iden¬ 
tical  to  that  of  Fig  2  with  the  addition  of  parallel  output 
detectors.  For  simplicity,  only  five  input  point  modulators 
are  shown  in  Fig.  4. 

We  describe  the  operation  of  the  optical  systolic  processor 
in  Fig  4  for  the  case  when  A4  signals  (vectors),  each  of 
length  N  and  each  on  a  separate  temporal  frequents,  are 
present  simultaneously  in  the  AO  cell.  We  refer  to  this  as 
frequency-multiplexing  of  the  input  AO  cell  data  When 
the  N  input  point  modulators  are  pulsed  on  in  parallel,  the 
associated  input  vector  multiplies  all  M  vectors  in  the  AO 
cell  This  produces  the  elements  of  M  separate  vector  inner 
products  Each  vector  inner  product  will  leave  the  AO  cell 
at  a  different  angle  (proportional  to  the  temporal  frequency 
used  for  each  of  the  M  input  signal  vectors  to  the  AO  cell) 
The  Fourier  transform  lens  thus  forms  each  of  these  M 
vector  inner  products  on  M  separate  output  plane  delec- 
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fig  4.  Simplified  schematic  of  a  frequency-multiplexed  general-matrix  acoustooptic  (AO) 
systolic  processor  |1 3)  with  both  local  and  global  interconnections  and  with  a  matrix-vector 
multiplication  as  its  basic  operation 


tors,  i  e ,  the  system  performs  a  matrix-vector  multiplication 
in  parallel  each  Te  of  time  During  the  next  Ts.  the  matrix 
data  in  the  AO  cell  shift  up  by  Te  At  this  time,  N  input 
point  modulators  (spatially  shifted  up  by  one)  are  pulsed  on 
with  new  vector  data  and  a  new  matrix- vector  product 
(with  the  same  matrix  as  before)  is  formed  in  parallel  on  the 
output  detectors  After  MTb,  a  matrix-matrix  product  has 
been  formed  as  M  matrix-vector  multiplications  (one  per 
7e)  For  square  matrices  (Ad  -  N),  N2  is  the  required  TBWP 
of  the  AO  cell  The  motivation  for  this  architecture  devised 
by  Casasent  et  a I  [13]  was  that  an  AO  modulator  architec 
ture  requires  N'  -  TBWP  input  point  modulators  and  out¬ 
put  detectors  to  fully  utilize  the  processing  capability  of  the 
AO  cell  In  the  AO  modulator-deflector  system  of  Fig  4, 
the  bit  time  Tt  and  AO  cell  bandwidth  WA  are  traded 
l  T,W„  -  N2  -  TBWP  is  fixed)  The  resultant  frequency-mul¬ 
tiplexed  architecture  is  easier  to  fabricate,  can  use  larger  T„ 
intervals,  and  performs  a  more  intensive  basic  operation 
(matrix-vector  multiplication  versus  a  vector  inner  product) 
each  TB  For  an  NX  N  matrix-matrix  multiplication,  we 
require  an  AO  cell  with  TBWP  -  N2,  2H  -  1  input  point 
modulators,  and  N  detectors.  If  the  input  point  modulators 
are  pulsed  on  with  new  vector  data  faster  than  every  Tt, 
higher  computation  rates  are  possible,  and  the  modulator 
and  deflector  architectures  achieve  the  same  computation 
rate,  with  the  associated  need  to  feed  input  data  and 
collect  output  data  at  a  faster  rate. 

We  now  discuss  how  all  of  the  basic  linear  algebra 
operations  required  can  be  accomplished  on  the  system  of 
Fig  4  by  various  data  encoding  choices  If  the  matrix  A  is 
fed  to  the  point  modulators  one  row  at  a  time  in  parallel, 
i.e ,  with  its  elements  amn  time  and  space  multiplexed  as 
a[f,  x],  and  the  vector  b  is  fed  time-sequentially  to  the  AO 
cell,  i.e  ,  with  its  elements  bn  encoded  as  b[t],  then  the 
matrix-vector  product  Ab  -  c  is  formed  one  element  cm  at 
a  time  as  c(f)  on  a  single  output  detector.  In  this  case,  the 
degenerate  system  of  Fig  2  is  adequate  (or  Fig  4  without 
frequency  multiplexing  and  with  only  a  single  output  de¬ 
tector  used)  In  the  case  when  A  is  fed  one  column  in 
parallel  per  TB  to  the  AO  cell,  i.e.,  its  elements  am„  are 
encoded  as  amn  -  a[  1,  f]  and  all  elements  bn  of  b  are  fed  in 
parallel  to  the  point  modulators  as  bn  —  6[x]  (i.e., 
space-multiplexed),  then  the  matrix-vector  product  Ab  -  c 
is  formed  in  parallel  in  space  on  the  output  detectors,  i.e., 
as  cm  -  c(x)  In  effect,  the  AO  cell  converts  time  to  space 


(i  e  ,  f  -»  x),  and  the  Fourier  transform  lens  behind  the  AO 
cell  converts  temporal  frequencies  to  spatial  coordinates 
(i.e.,  f  —  x).  Next,  we  consider  matrix-matrix  multiplica¬ 
tion.  If  A  is  fed  to  the  AO  cell  one  row  at  a  time  as 
anm  -  a\l,f]  and  8  is  fed  to  the  input  point  modulators  as 
bmn  -  b[f,  xj,  the  matrix  product  BA  is  produced  one  row 
at  a  time  in  parallel  on  the  output  detectors.  With  the 
opposite  encoding,  amn  -  a[f,  f]  and  b„m  “  b[x,f],  the  ma¬ 
trix  product  AB  is  produced  one  column  at  a  time  in 
parallel  on  the  output  detectors.  Reference  [20]  provides  a 
tutorial  description  of  the  frequency-multiplexed  architec¬ 
ture  of  Fig  4  for  those  readers  less  versed  in  optical  Fourier 
transforms. 

In  Table  1,  we  summarize  the  various  operations  that 
result  from  the  different  possible  encoding  choices  The 
two  matrix-matrix  multiplication  techniques,  with  the 
product  matrix  fed  back  to  the  AO  cell  and  a  new  matrix  C 
fed  to  the  input  point  modulators,  produces  the  triple-ma¬ 
trix  products  CBA  or  ABC  Various  other  operations  can  be 
performed  on  this  system  using  these  basic  functions  These 
are  noted  under  applications  in  Table  1.  Subsequent  sec¬ 
tions  will  detail  each  of  these.  They  are  included  in  Table  1 
at  this  time  for  completeness.  In  general,  these  operations 
are  accomplished  by  feeding  back  the  output  of  the  system 
to  the  AO  cell  or  the  input  point  modulators  In  subsequent 
sections,  we  will  assume  that  the  matrix  to  be  processed 
can  be  accommodated  in  half  of  the  AO  cell  and  that  one 
row  or  column  of  it  can  be  accommodated  by  half  of  the 
input  point  modulators.  If  this  is  not  the  case,  matrix  parti¬ 
tioning  techniques  are  required.  Such  issues  are  not  in¬ 
cluded  at  present  to  simplify  description  of  the  algorithms 
and  architectures. 

Other  encoding  schemes  are,  of  course,  possible,  but 
have  thus  far  not  been  found  useful  These  include  a) 
A  -  a[f,  f]  and  6[x],  which  yields  ATb  in  parallel,  b)  A  - 
a[f,  1 ]  and  B  -  b[x,f],  which  yields  AT8;  etc 


III.  Parallel  Systolic  Indirect  Algorithms  tor  the 
Solution  or  Systems  of  Linear  Algebraic  Equations 

A  wealth  of  literature  exists  on  various  algorithms  for  the 
solution  of  systems  of  linear  algebraic  equations  (LAEs), 
where  we  wish  to  find  the  vector  solution 
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Table  1  Format  Control  (or  Flexibility  and  Data  Flow  in  Fig  4 


Operation 

Notation 

AO  Cell 

Point 

Modulators 

Applic  ations 

Matrix-Vector 

Multiplication 

Ab 

b-  bin 

A  -  a[  f,  x) 

Banded  Matrix  Mulliplu  ation 

Solve  Banded  Matrix  Problems 

Solve*  Triangular  Matrix  Problems 
(Feedbark  to  AO  Cell  or  Point  Modulators) 

Malnx-Vecfor 

Multiplication 

Ab 

b  m  b(t) 

A  “  a( t) 
(one  row) 

Solve  Toeplitz  Matrix  Problem 
(Feedback  to  AO  Cell) 

Matrix-Vector 

Multiplication 

Ab 

A  -  at  r) 

(one  column) 

b  -  h(l) 
(serially) 

Solve  Toeplitz  Matrix  Problem 
(Feedbat  k  to  Point  Modulators) 

Matrix-Vector 

Multiplication 

Ab 

A  =  a|(.f] 

b  =  b|  x] 

Solve  Systems  ol  LAEs 
(Feedback  to  Point  Modulators) 

Matrix-Matrix 

Multiplication 

BA 

A  -  a(r,  1] 

B  -  b|r,  x] 

Triple  Matrix  Produr  i  CBA 
l  LJ  Matrix  Decomposition 

Direct  LAFs  Solution  bs  ( Dor  QR 

Least  Squares  Solution  by  /Ivor  QR 
(Feedbac  k  to  AO) 

Matrix-Matrix 

Multiplication 

AB 

A  -  alf.t] 

B  -  b(*,r) 

Triple  Matrix  Product  ABC 
(Feedbac  k  to  AO) 

QR  Matrix  Decomposition 

Without  Vector  Outer  Produc  t  Proc  ossor 
(Feedback  to  Point  Modulators  and  AC)  (ell 

to  the  matrix-vector  equation 

Ax  =  b  (4) 

where  all  vectors  and  matrices  are  assumed  to  be  of  order 
N  (i.e.,  the  order  of  the  system  of  equations)  throughout 
this  paper.  This  one  problem  is  central  to  many  image  and 
signal  processing  applications,  and  thus  we  detail  various 
solutions  for  LAEs  in  this  and  the  next  section.  The  material 
in  Sections  III  and  IV  draws  heavily  on  several  surveys  of 
operations  achievable  on  optical  systolic  processors  (26), 
[27]  and  associated  journal  literature  [8],  [28]-[30]  As  de¬ 
tailed  by  Rice  [31]  and  others,  the  two  major  classes  of 
solutions  to  systems  of  LAEs  are  direct  (matrix  decomposi¬ 
tion)  and  indirect  (iterative)  Direct  algorithms  are  discussed 
separately  in  Section  IV. 

Four  linear  iterative  algorithms  to  solve  the  LAEs  Ax  -  b 
can  easily  be  identified  These  solutions  emerge  from  the 
additive  splitting  of  the  coefficient  matrix  into 

A  -  D  -  L  -  U  (5) 

where  D  is  a  diagonal  and  nonsingular  matrix,  i  is  lower 
triangular  (elements  only  on  and  below  the  main  diagonal), 
and  U  is  upper  triangular  The  four  iterative  algorithms  then 
become  [26],  [27],  [32],  [33]:  the  Richardson  algorithm  (also 
called  simultaneous-displacement  or  semi-iterative,  de¬ 
pending  upon  whether  u  is  constant  with  j) 

*(/  +  1)  -  x(j)  -  wAx(j)  +  ub  (6) 

the  jacobi  algorithm 

*(/  +  1)  -  [O  (L  +  U)]x(i)  +  D  b  (7) 
the  Causs-Seidel  algorithm 

*0  +  1)  -  [(0  -  O  ’c/J*0)  +(D-  L)  b  (8) 

and  the  successive  overrelaxation  (SOR)  algorithm 
*(/+  1)  -  {(O  -  wt)"’[(  1  -  w)D  +  «at2]} 

•*(/')  +  «(0  -  uL)  'b  (9) 


In  (6)  and  (9),  u  is  an  acceleration  or  waling  patamep  r  that 
regulates  the  rate  of  convergence  and  appropriately  scales 
the  eigenvalues  to  insure  convergence  The  choice  or  one 
of  these  four  iterative  algorithms  depends  on  many  tar  tors 
that  are  highly  application  and  problem  dependent  (eg 
convergence  of  the  algorithm,  the  dynamic  range  <>(  the 
matrices,  the  number  of  iterations  required,  and  the  ease  of 
implementation)  The  Causs-Seidel  algorithm  m  (h)  is 
equivalent  to  the  SOR  algorithm  in  (9)  when  u  «  1  Con¬ 
vergence  of  the  algorithms  in  (7)-(9)  requires  that  A  have 
various  specific  properties  [32],  [33]  In  (6).  A  muM  be 
completely  stable  or  unstable,  in  (7),  A  must  be  strongly 
diagonally  dominant,  and  in  (8)  and  (9),  A  must  be  positive 
definite  (i.e.,  have  only  positive  eigenvalues)  Calculation  of 
b>  in  (9)  imposes  other  matrix  conditions  [32],  [33].  We  have 
chosen  to  concentrate  on  the  Richardson  algorithm  in  (6) 
because  selection  of  u  for  stability  and  convergence  is 
quite  easy,  because  convergence  is  insured  when  A  (or 
_  A)  is  stable  (i.e.,  when  all  eigenvalues  lie  strictly  in  the 
left  (right)-half  plane),  and  because  its  optical  implementa¬ 
tion  is  easy  to  detail.  The  selection  of  u  and  stopping 
criteria  are  discussed  later. 

To  understand  how  such  iterative  algorithms  are  imple¬ 
mented  using  matrix-vector  processors,  let  us  consider  the 
use  of  (6)  to  solve  Ax  —  b  At  iteration  one,  we  usr  our 
initial  estimate  *(0)  of  x  and  form  Ax(0)  (this  requires  a 
matrix-vector  multiplication)  We  then  subtract  this  from 
the  vector  b,  multiply  the  result  by  u,  and  add  x(0)  to  the 
result.  This  produces  the  right-hand  side  of  (6)  and  hence 
our  next  *(/—  1)  estimate,  which  is  then  ted  back  to  the 
input  of  the  system.  We  then  repeat  the  above  operations 
with  *«jr(1)  These  iterations  continue  until  *(/  *  i)s 
x(j)  Stopping  criteria  for  iterative  algorithms  are  discussed 
below.  When  x(j  +  1)s  *(/),  we  see  that  (6)  reduces  to 
Ax  -  b,  where  *  -  *(/),  and  thus  the  resultant  *  is  the 
solution  x  —  A~  'b  to  Ax  —  b  This  iterative  algorithm  thus 
requires  successive  matrix-vector  multiplications  and  vector 
addition  at  each  iteration  The  basic  element  of  a  processor 
to  iteratively  solve  LAEs  is  thus  a  matrix-vector  multiplier 
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Let  us  now  consider  the  use  of  in-*  systems  of  Figs  2-4  to 
realize  the  iterative  solution  in  (6)  to  systems  of  LAEs  for 
the  banded-matrix  architecture  of  Fig  2,  the  matrix  is  fed  to 
the  input  point  modulators  and  the  vector  to  the  AO  cell 
and  a  new  vector  element  is  produced  at  the  output  detec¬ 
tor  each  7g  The  aperture  time  fA  -  NTa  for  the  AO  cell  is 
often  such  that  N  is  much  larger  than  the  bandwidth  of 
banded  matrices.  Thus  the  input  point  modulators  can  be 
located  at  appropriate  positions  near  the  upper  end  of  the 
aO  cell  and  the  lower  portion  of  the  AO  cell  can  be  used 
to  store  the  calculated  vector  elements  as  they  are  pro¬ 
duced  In  this  case,  newly  produced  vector  data  (one  ele¬ 
ment  per  Jg)  at  the  output  are  used  to  produce  the  right- 
hand  side  of  (6)  and  are  then  immediately  fed  to  the  new 
data  slot  available  at  the  bottom  of  the  AO  cell.  Such  an 
architecture  is  shown  in  Fig  5  for  the  system  of  Fig  2  (a 
degenerate  case  of  Fig  4  with  one  output  detector)  This 
system  requires  only  one  detector,  a  one-channel  resistive 
subtractor  and  adder,  and  a  single  operational  amplifier  [10]. 
To  iteratively  solve  a  set  of  Toeplitz  LAEs,  the  systems  of 
Figs  2  or  3  can  be  used  as  the  basic  block  with  feedback  to 
the  AO  cell  [10] 

Next,  we  consider  the  solution  of  LAEs  with  general 
matrices  without  special  structure  In  this  case,  the  architec¬ 
ture  of  Fig  4  (or  similar  ones  that  produce  one  matrix-vec¬ 
tor  multiplication  per  TB)  is  considered  as  the  basic  element 
in  the  system  For  the  specific  system  of  Fig  4,  the  vector 
»(/)  data  are  fed  in  parallel  to  the  input  point  modulators 
and  the  matrix  A  data  are  fed  to  the  AO  cell  as  a[  I,  f].  After 
the  matrix  data  have  been  loaded  into  the  AO  cell  (after  a 
latency  time  NTB),  the  elements  of  the  vector  *(/)  are 
applied  to  the  input  point  modulators  Immediately,  the 
matrix-vector  product  Ax(j)  is  produced  in  parallel  on  the 
output  detectors  This  output  matrix-vector  product  is  read 
out  in  parallel,  operated  upon  by  the  N  elements  of  the 
vector  b.  etc  ,  to  produce  the  right-hand  side  of  (6)  in 
dedicated  analog  hardware.  The  new  *(/'  +  1)  vector  data 
are  then  fed  back  to  the  input  point  modulators  in  parallel 
At  the  next  JB  bit  time,  the  next  iteration  occurs.  Thus,  one 
iteration  occurs  every  Tg,  and  the  data  flow  in  the  system  is 
such  that  the  processor  is  kept  fully  active,  i.e.,  the  data- 
handlmg  requirements  of  the  output  vector  data  produced 
and  the  input  vector  data  required  are  simultaneously  satis¬ 
fied  by  the  feedback  arrangement  in  a  system  which  is 
easily  synchronized  The  associated  architecture  and  data 
flow  for  this  algorithm  are  shown  in  Fig  6  The  system  of 
Fig  5  employs  the  data  encoding  scheme  in  line  one  of 
Table  1 ,  and  the  system  of  Fig  6  uses  the  encoding  noted  in 
line  two  of  Table  1  After  NTB  of  time,  the  system  of  Fig  6 
has  performed  N  iterations  of  the  Richardson  algorithm  in 
(6)  If  the  problem  iSt  such  that  the  iterations  will  not 
converge  sufficiently  in  this  amount  of  time,  then  the 
matrix  contents  of  the  AO  cell  are  constantly  recycled,  i.e., 
as  matrix  data  reach  the  upper  end  of  the  cell,  the  associ¬ 
ated  new  matrix  data  are  immediately  reentered  into  the 
bottom  of  the  cell  The  length  of  the  cell  need  only  satisfy 
T ,  -  NTg  (where  N  is  the  size  of  the  matrix  to  be  processed), 
and  only  N  input  point  modulators  are  required  [34]  In  the 
case  where  the  aperture  of  the  cell  satisfies  7*  «  2NTB,  new 
matrix  or  vector  data  can  be  entered  into  the  bottom  of  the 
cell  as  required  for  the  next  operation  following  the  solu¬ 
tion  of  the  LAEs  As  we  briefly  discuss  in  Section  V,  the 
solution  of  LAEs  is  rarely  the  only  operation  to  be  per¬ 
formed  in  advanced  modern  signal  processing  As  further 


Fig.  5.  Si  mplified  schematic  of  an  optical  systolic  processor 
|10]  to  solve  banded-matrix  LAEs  by  indirect  algorithms 


fig.  6.  Simplified  schematic  of  an  optical  processor  [13]  to 
solve  LAEs  with  general  matrices  using  indirect  algorithms 

extensions  of  iterative  algorithms,  we  note  that  (6)  and  the 
associated  processors  can  also  be  realized  for  the  case 
when  A,  b.  and  u  are  time-varying  functions  of  the  itera¬ 
tion  index  j  This  extension  allows  the  general  LAEs  solution 
presented  to  be  extended  to  time-varying  stochastic  gradi¬ 
ent-following  algorithms  in  adaptive  filtering  and  signal 
processing 

Experimental  demonstrations  of  the  algorithm  in  Fig  4  for 
a  Toeplitz  matrix  in  a  deconvolution  application  [10]  and  for 
the  system  of  Fig  5  for  a  full  matrix  [34]  have  been  reported 
Iterative  algorithms  require  attention  to  stability,  con¬ 
vergence,  the  choice  of  u,  and  the  stopping  criteria  These 
issues  are  problem  dependent  but  easily  obtained  given  the 
available  a  prion  problem  specifics  Iterative  algorithms 
appear  essential  for  the  so  non  of  eigensvstems  and  for 
singular  value  decomposition,  as  noted  in  Section  V,  and 
thus  much  future  work  on  such  algorithms  is  expected  In 
general,  the  successful  use  of  iterative  algorithms  requires 
slight  application-dependent  algorithm  and  matrix  modifi¬ 
cations  Specific  examples  will  be  detailed  in  future  publi¬ 
cations  However,  several  general  guidelines  are  advanced 
below 

Let  us  now  address  the  general  guidelines  for  parameter 
selection  in  indirect  algorithms.  We  will  consider  selection 
of  w  and  the  stopping  criteria  For  such  analyses,  we  be¬ 
lieve  that  one  should  utilize  deterministic  engineering  tech¬ 
niques  and  digital  simulation  rather  than  formal  mathe¬ 
matical  analyses  (since  formal  analyses  are  valid  only 
"in-the-limit”)  and  develop  tight  upper  bounds  from  ana¬ 
lytical  models  to  characterize  convergence  of  the  algorithm 
(27).  We  will  first  develop  the  general  expressions  for  the 
case  of  the  iterative  algorithm  in  (6)  and  then  separateK 
address  selection  of  u  and  the  stopping  criteria  We  first 
note  that  the  right-hand  side  of  (fc)  weights  the  difference 
between  the  calculated  solution  *(/)  at  iteration  /  and  the 
weighted  error  <«)[A*(/)-b]  in  the  exact  solution  The 
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algorithm  successfully  reduces  this  weighted  error  This  is 
best  seen  by  writing  the  computational  error  e(j)  at  the  /th 
iteration  as  the  difference  between  the  calculated  solution 
*(/)  at  the  /th  iteration  and  the  exact  solution  x*  =  A  'b 
The  error  vector  is  thus 

*0)  -Hi)-**  (10) 

The  error  vectors  on  successive  iterations  can  be  related  by 

e(j+ n  =  \l- uA)e(j)  (ll) 

After  j  Richardson  iterations,  the  error  e(/)  is  related  to  the 
initialization  error  e(0)  by 

e(/)=  [/-a>A]'e(0)  (12) 

To  facilitate  selection  of  a  fixed  number  /  of  iterations,  we 

require  a  tight  upper  bound  on  the  norm  of  the  computa¬ 
tional  error  in  (12)  The  classic  upper  bound  is 

l|e(/)||«||[f-  <aA)||'||e(0)||  (13) 

Let  us  now  consider  the  choice  of  <*>  and  then  return  to 
evaluating  (13)  It  is  well  known  (35)  that  for  the  eigenval¬ 
ues  of  [/  -  wA]  to  be  less  than  unity,  one  uses 

(H) 

and  that  of  A  is  bounded  by  the  Euclidean  norm  as 

VI  N  TV.' 

«S||A||-  I  14,,,  .  (15) 

m  ™  1  n-  1 

Since  the  upper  bound  in  (15)  is  weak,  we  select 

w  -  p/|| A||  (16) 

where  p  is  a  problem-dependent  constant  greater  than  two 

In  our  algebraic  Ricatti  equation  (ARE)  solutions  (Section 
VT-C).  we  have  empenca//y  [8 J,  (36)  selected  p  -  3  and 
consistently  achieved  excellent  performance  in  over  ten 
cases  investigated 

If  A  is  symmetric  and  w  satisfies  (14),  the  norm  ||/  -  uA || 
in  (13)  is  well  approximated  [33]  by  the  spectral  radius  (the 
largest  eigenvalue,  in  absolute  value)  of  [f-  uA],  le.  by 
(1-1  /C).  where  the  condition  number  C  -  (the 

ratio  of  the  largest  to  smallest  eigenvalues  of  A,  in  absolute 
value)  Substituting  this  into  ( 1 3).  we  obtain 

11*0)11  *  ll  ~  1/C(A)]'||e(0)||  =  exp  [  -//C(  A  )]||e(0)|| 

(17) 

Equation  (17)  describes  the  performance  (convergence)  of 
the  Richardson  algorithm  for  /  iterations  We  see  that  it  is 
determined  by  C  and  thus  from  an  estimate  of  C  we  can  fix 
the  number  of  iterations  at  a  constant  number  /  to  achieve 
a  given  accuracy  or  error  in  (17)  Selection  of  a  fixed  /  is 
quite  problem-dependent  In  our  work  in  Sechon  Vl-C  we 
found 

I  —  3  0C  (18) 

has  yielded  excellent  performance  for  those  cases  consid¬ 
ered  In  specific  cases,  i  e  ,  if  A  is  the  covariance  matrix,  C 
can  often  be  approximated  by  the  ratio  of  the  strengths  of 
the  signals  expected  f37]  Since  the  exact  answer  is  rarelv 
known  and  in  some  cases  an  estimate  of  C  is  not  easik 
obtained,  one  can  simply  continue  the  lion  until  the 
norm  of  the  difference  between  successive  iterates  is  below 
a  preset  error  th  .hold  (3s. j  Goodman  and  Song  [38]  have 
shown  that  this  approach  is  also  helpful  in  reducing  the 


effect  of  noise  in  an  optical  iterative  processor  Ax  noted  at 
the  outset,  the  effective  use  of  iterative  algorithms  is  very 
problem-dependent  and  merits  further  research 

IV.  Paraiui  Svstouc  Dirkt  Aigoriihms (or  tin 
SOIUIIONOI  StSUMSOI  LAEs 

The  iterative  solution  of  LAEs  (Section  III)  was  an  obvious 
choice  for  the  initial  optical  matrix-vector  processors  using 
fixed  2-D  spatial  masks  to  store  the  matrix  data  spatially  [8) 
However,  with  the  advent  of  systolic  processors  using  AO 
cells,  a  different  algorithm  philosophy  (direct  solutions  to 
LAEs)  emerged  [39],  [40]  Since  the  matrix  data  in  wxtolic 
processors  shift  through  the  AO  cell  one  row  or  column  per 
Te.  a  new  vector  row  or  column  of  a  matrix  must  be  u-d  to 
the  AO  (or  other)  transducer  every  Te  (i  e  ,  the  matrix  in  an 
optical  systolic  processor  must  be  updated  each  Th)  Re¬ 
search  on  direct  algorithms  for  the  optical  solution  ol  LAEs 
and  for  matrix  decomposition  is  thus  quite  new  However, 
parallel  algorithms  for  LU  [28],  QR  [29],  and  Cholesks  [28] 
matrix  decomposition  and  parallel  algorithms  for  the  solu¬ 
tion  of  triangular  LAEs  [30]  and  general  LAEs  using  optical 
systolic  processors  have  been  detailed  and  published  during 
the  past  year  In  this  section,  we  summarize  this  research  In 
Section  V,  we  discuss  various  possible  extensions  of  these 
initial  matrix  decomposition  algorithms  to  more  advanced 
optical  systolic  processors 

The  general  philosophy  in  matrix  decomposition  solu¬ 
tions  to  LAEs  is  to  convert  the  given  Ax  =  b  LAE  problem 
into  a  simpler  one  where  the  new'  matrix  has  specific 
structure  that  allows  the  solution  of  a  simpler  matrix-vector 
equation  (by  easily  implemented  techniques  such  as  for¬ 
ward  or  backward  substitution)  Most  matrix  decomposition 
techniques  are  variants  of  Gaussian  elimination  The  two 
conditions  used  in  the  various  Gaussian-elimination-baxod 
algorithms  are  that  1)  the  elements  of  any  row  of  the  matrix 
A  can  be  multiplied  by  a  nonzero  real  number,  and  2)  a 
constant  multiple  of  any  row  can  be  added  to  the  associ¬ 
ated  elements  of  any  other  row.  All  matrices  produced  by 
these  operations  are  equivalent  The  two  classic  direct  LAE 
solutions  are  LU  (triangular)  and  QR  (orthogonal)  matrix 
decomposition  We  thus  discuss  these  two  algorithms  and 
their  optical  systolic  processor  realization  in  this  section  In 
Section  V,  we  discuss  advanced  applications  of  other  de¬ 
composition  algorithms  to  other  modern  signal  processing 
problems 

In  L  U  decomposition,  the  matrix  A  in  the  original  Ax  *=  b 
problem  is  decomposed  into  a  lower  t  and  upper  U  trian¬ 
gular  matrix,  where  the  diagonal  elements  of  L  are  all 
and  the  resulting  decomposition  is  unique  Thus  the  origi¬ 
nal  Ax  —  b  problem  (where  *  is  unknown)  becomes  (sub¬ 
stituting  A  =  LU)  two  problems  First,  one  can  solve  LUx  = 
b  for  y  —  Ux  and  second  solve  Ux  =  y  for  x,  where  U  and 
y  are  known  from  the  LU  decomposition  and  the  firM 
triangular  system  solution  in  step  one  Each  of  these  sub¬ 
problems  requires  the  solution  of  a  lower  or  upper  triangu¬ 
lar  system  of  equations  This  is  trivial  on-line  in  dedicated 
digital  hardware  In  QR  matrix  decomposition,  the  matrix  A 
is  factored  or  decomposed  into  an  orthogonal  matrix  Q 
(such  that  Q’  -  Q  1  or  QQ’  -  I)  and  an  upper  triangular 
matrix  R  In  this  case,  the  original  Ax  ■*  b  LAE  problem 
reduces  to  Rx  -  Q'  'b  -  Q’b  -  U.  le,  the  solution  of 
another  simplified  triangular  system  of  equations  Rx  =  b  ax 


before  Achieving  the  matrix  decomposition  in  either  of 
these  algorithms  was  recognized  early  (39]  as  the  maior 
computational  step  in  such  algorithms  We  thus  first  detail 
how  to  achieve  LU  and  QR  decomposition  (Sections  IV-A 
and  IV  -B)  on  optical  systolic  processors  Then,  we  address 
an  optical  solution  to  the  triangular  system  of  equations 
that  results  (Section  IV-C),  and  finally  (Section  IV-D),  we 
detail  a  full  direct  systolic  solution  in  N  matrix-matrix 
multiplication  steps  The  ma|or  reasons  for  interest  in  direct 
versus  indirect  solutions  to  LAEs  is  that  the  number  of 
iterations  required  in  an  indirect  solution  (Section  III)  is  not 
easily  quantified  and  is  thus  highly  problem -dependent 
Conversely,  direct  solutions  require  a  fixed  number  of  steps 
N  (the  order  of  the  matrix)  The  parallel  aspects  of  direct 
LAE  solutions  must  be  properly  advanced  and  not  imple¬ 
mented  on  systolic  architectures  as  in  the  conventional 
linear  algebra  descriptions  Our  algorithms  and  architec¬ 
tures  will  demonstrate  such  parallel  guidelines  and  the  best 
U'O  of  matrix  decomposition  with  systolic  processors 

A  Optical  Systolic  Realization  of  L  U  Matrix  Decomposition 

1-9] 

All  matrix  decomposition  solutions  to  Ax  =  b  involve 
multiplying  A  and  b  by  a  decomposition  matrix  P  (this 
\ telds  P0A  -  A,  and  P0b  -  /»,),  multiplying  A,  and  b  by  a 
matrix  P„  etc  After  N  such  matrix-matrix  multiplications, 
one  obtains  a  matrix  PA  and  a  vector  Pb  *  b.  where 
/>«  Pm  ■■■  PQ.  In  matrix  decomposition,  each 

Pn,Am  multiplication  only  affects  columns  (or  rows)  m 
through  N  of  A„,_ 

In  IU  decomposition,  P„,  is  chosen  to  force  the  elements 
below  the  diagonal  in  the  mth  column  of  Am  to  be  zero 
On  successive  cycles,  we  require  a  matrix-matrix  multipli¬ 
cation  and  a  matrix-vector  multiplication.  These  operations 
are  combined  (since  the  same  matrix  is  used  in  both)  into 
the  multiplication  of  the  matrix  P„,  by  the  augmented 
matrix  [Am;bm]  Each  successive  cycle  of  the  system  thus 
requires  a  matrix-matrix  multiplication,  calculation  of  one 
column  of  P^.  and  assembly  of  the  Pm  matrix.  On  each 
successive  cycle,  we  produce  one  row  of  the  upper  triangu¬ 
lar  matrix  U.  one  element  of  the  new  b  vector  (from  the 
matrix-matrix  multiplication),  and  one  column  of  P  “  L 
(from  the  calculations  of  the  Pm  matrix)  This  is  achieved  as 
detailed  below  Pm  is  an  identity  matrix  except  for  column 
m  whose  elements  p'vm„  are  well-known  and  easily  calcu¬ 
lated  [28(  functions  of  the  elements  of  the  mth  row  of 
A„,  |,  i  e  . 

Pi  m  “  i  m  /  °m  m  '  ' 

where  superscripts  denote  the  step  or  matrix-matrix  multi¬ 
plication  number  If  A  is  neither  strictly  diagonally  domi¬ 
nant  nor  positive-definite,  pivoting  (i.e  ,  interchanging  of 
rows  of  the  matrix)  is  necessary  to  insure  that  (19)  is  less 
than  unity 

The  data  flow  for  use  of  a  systolic  processor  for  LU 
decomposition  is  shown  in  Fig  7  The  matrix  A  augmented 
by  the  vector  b  (i.e,  [Am;6m]  is  fed  to  the  AO  cell,  mu'ti- 
plied  by  Pm  to  yield  the  new  augmented  matrix  [  Am<.  y,bm* ,] 
After  each  matrix-matrix  multiplication,  one  row  of  the 
final  U  matrix  and  one  element  of  the  final  b  vector  are 
produced  (i.e  ,  the  first  row  of  Am.,  and  the  first  element 
of  are  in  final  converted  form)  One  additional  row 


fig.  7.  Simplified  schematic  of  an  optical  systolic  piod-s-o' 
(28)  to  perform  LU  matrix  decomposition  of  a  general  main* 
by  Gaussian  elimination 


and  column  of  the  augmented  matrix  just  calculated 
[Am,  , \bm. ,)  is  thus  not  altered  or  needed  in  each  subse¬ 
quent  matrix-matrix  multiplication  Thus  we  remove  one 
row  and  column  of  the  matrix  product  produced  at  each 
cycle  and  reduce  the  order  of  each  subsequent  matrix-ma 
trix  multiplication  by  one  The  remaining  elements  of  A„  , 
and  bm, ,  are  fed  back  to  the  AO  cell  and  the  processor  as 
they  are  produced  (one  row  at  a  time  in  parallel)  The  new 
elements  to  be  calculated  in  P„,4l  require  only  the  ele¬ 
ments  of  one  column  m  +  1  of  the  new  Am . ,  matrix  as  in 
(19)  Since  one  row  of  Am, ,  is  produced  in  parallel  each  7B 
the  column  elements  of  Am . ,  needed  to  compute  Pm.  ,  are 
available  one  element  each  Te  (from  the  same  output 
detector)  As  each  element  of  the  proper  Am„  ,  column  is 
produced,  the  element  in  the  corresponding  column  of 
Pm. ,  is  calculated  during  Tg  and  stored 

These  calculations  in  (19)  are  performed  in  special  pur¬ 
pose  analog  hardware  in  the  box  noted  in  Fig  7.  These 
operations  are  easily  achieved  during  1 TB  since  only  one 
element  per  row  of  must  be  computed  Similarly, 

storage  of  these  Pm.,  values  and  formatting  the  Pm., 
matrix  (an  identity  matrix  except  for  one  column)  for  input 
to  the  point  modulators  one  row  at  a  time  in  parallel  is 
easily  achieved  [28]  The  operations  that  the  special-pur¬ 
pose  analog  hardware  must  perform  are  inversion  of  the 
first  element  of  the  appropriate  column  of  Am. ,  during  the 
first  Te  At  subsequent  Te  times,  this  element  is  multiplied 
by  the  new  column  elements  of  A^*,  to  generate  the 
appropriate  new  elements  in  the  new  column  in  P„  ,  as 
defined  in  (19)  Each  new  row  of  Pm„ ,  defines  a  row  of  L 
which  is  available  as  an  output  as  shown  in  Fig  7  After 
each  matrix-matrix  multiplication,  one  column  of  L  ' .  one 
row  of  U,  and  one  element  of  b  are  computed  After  N 
matrix-matrix  multiplications,  the  full  LU  decomposition 
has  been  achieved  and  L "  \  U,  and  b  have  been  produced 
Since  the  order  of  the  matrix  is  reduced  by  one  on  each 
cycle,  and  (assuming  one  matrix-vector  multiplication  each 
TB)  the  matrix  decomposition  (including  NTB  of  latency 
time  to  load  the  cell)  thus  requires  a  total  time 

fN+(N-  1 ) +(N- 2)  +  •■■  +  2]TS 

-  +  M  -  2)/2]  Te  (20) 

or  (for  large  N)  s  N’/2 Te.  during  which  (for  large  N) 
approximately  N3/3  multiplications  and  additions  are  per¬ 
formed 
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B  Parallel  Optical  Systolic  Realization  of  QR  Matrix 
Decomposition  [29] 

QR  decomposition  or  orthogonal  matrix  factorization  (A 
-  QR)  can  be  accomplished  (41],  (42)  by  modified 
Gramm- Sc  hmidt  orthogonalization,  Householder  plane  re¬ 
flections  (41],  or  Givens  plane  rotations  [43]  The  first  meth¬ 
od  requires  transposing  a  matrix  and  performing  two  ma¬ 
trix-matrix  multiplications  to  produce  one  column  ol  Q 
and  R  T tie  last  method  requires  one  matrix-matrix  multipli¬ 
cation  to  produce  one  element  of  Q  and  R  Thus  the 
Householder  QR  decomposition  appears  to  be  the  most 
practical  and  most  easily  paralleled  algorithm  for  QR  fat - 
torization  since  it  produces  one  column  and  rose  of  the 
final  decomposed  Am  matrix  in  each  step  The  basic  steps 
in  this  operation  are  similar  to  those  in  Fig  7  for  LU 
decomposition,  i  e  ,  successive  multiplication  ol  A  and  b  by 
a  decomposition  matrix  P0  to  produce  A,  and  />,,  multipli¬ 
cation  of  this  matrix  and  vector  by  P,  to  produce  A,  and  b  , 
etc  The  decomposition  matrices  in  QR  decomposition  are 
different  from  those  in  LU  decomposition  Each  successive 
matrix-matrix  multiplication  produces  one  row  of  the  final 
upper  triangular  matrix  R  After  N  such  matrix-matrix  mul¬ 
tiplications,  we  obtain  PA  *=  R  and  Pb  =  b ,  where  P  = 
Q  •=  Q'  QR  decomposition  yields,  after  each  matrix- 
matrix  multiplication,  one  row  of  R  and  one  row  and 
column  of  Q'  as  in  LU  decomposition  One  row  and 
column  of  the  computed  augmented  matrix  { /%„_,(/>„,)  are 
not  needed  on  subsequent  matrix-matrix  multiplications 
Thus  the  order  of  the  matrix-matrix  multiplications  can  be 
reduced  by  one  on  each  subsequent  cycle  This  will  repre¬ 
sent  a  considerable  reduction  in  the  computational  time 
and  system  performance  as  we  note  in  Section  IV-D  The 
full  Am  matrix  alter  cycle  m  will  have  the  structure 


! 
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where  R  is  an  upper  triangular  matrix.  On  subsequent  steps 
only  W  is  changed  (>VN_m  denotes  that  W  for  A„,  is  of 
order  N  -  m)  To  calculate  the  next  Pm . ,  decomposition 
matrix,  only  the  elements  of  the  first  column  of  W  are 
needed  We  denote  the  first  column  of  W  by  the  column 
vector  w„, ,  |  The  equation  to  generate  P„,  is  (41] 

Pm~  I  ~  (22) 

The  vector  u„,  in  (22)  is  the  same  as  wm  except  for  the  first 
element,  which  is 

u,  “  Wm  m  +  tm  Sign  (  wm  m)  (23) 

whore  fm  is  the  norm  or  the  vector  inner  product  of  i*n,( 

i.e., 

N  -  m 

<"m“  L  wm.,m  (24) 

»-  1 

and  the  constant  km  is 

“  I'm+aWm  j]  ’  (25) 

The  steps  in  a  Householder  QR  decomposition  thus 
involve  1)  calculation  of  P„,  (this  requires  a  vector  outer 
product),  and  2)  the  matrix-matrix  and  matrix-vector  multi¬ 
plications  PmAm  —  Am.,  and  Pmb„.  —  bn,.,  The  second 
operation  is  performed  on  the  system  of  Fig  4  using  the 


data  encoding  in  the  last  line  of  Table  1  Since  idlculation 
of  Pnl  in  (23)  requires  only  one  column  of  tin  new  A 
matrix,  we  generate  A  one  column  at  a  time  either  than 
one  row  at  a  time  (using  the  data  encoding  rmii-d)  Step 
one  is  clearly  the  critical  operation  in  terms  ot  data  flow 
and  efficient  processor  use  Since  one-  column  of  A  can  be 
produced  in  parallel  on  the  system,  the  vector  outer  prod¬ 
uct  operation  can  be  performed  on  the  optical  ‘■■.Mem  0f 
Fig  4  by  several  methods  |29],  1 39]  If  A  is  symmetric,  we 
can  [29]  operate  with  the  transposed  A1  matrix  and  utilize 
the  symmetry  ot  A,  Am,  and  Pn:.  For  a  nonsymmetm  A,  one 
can  (39]  produce  one  column  u  of  W  in  parallel  and  (in  N 
cycles)  compute  the  vector  outer  product  in  (22)  the  norm 
from  the  trace  of  umu]„.  and  evaluate  kni  Both  of  these 
approaches  require  intermediate  data  storage  and  leave  the 
processor  inactive  during  fill  times  of  the  AO  cell  More 
attractive  data  flow  results  if  the  optical  system  in  fig  8 
(using  two  crossed-pomt  modulator  arrays)  is  used  to  per¬ 
form  the  vector  outer  product  This  vector  outer-product 
system  ot  Fig  8  is  detailed  elsewhere  [55]  It  involves 


Fig.  8.  Si  mplified  schematic  ot  an  optical  systolic  vector 
outer  product  processor  (adapted  from  [55]) 


imaging  the  modulator  at  plane  P,  honzontaNy  onto  the 
output  plane  Pj,  with  P,  compressed  vertically  and  ex¬ 
panded  horizontally  as  shown  to  uniformly  illuminate  P  , 
and  with  P,  imaged  horizontally  onto  the  P3  output  plane 
More  attractive  data  flow  within  the  optical  matrix  multi 
plier  results  if  rows  are  fed  in  parallel  with  data  encoding 
achieved  as  in  the  LU  processor  of  Fig  7. 

The  use  of  the  combined  architectures  of  Figs  4  and  8  for 
QR  matrix  decomposition  is  shown  in  Fig  9  In  this  archi 
tecture,  one  row  of  Aw  is  produced  in  parallel  on  the 
output  detectors  The  first  row  produced  (one  row  of  R)  is 
an  output  and  the  remaining  rows  are  reinserted  into  the 
AO  cell.  When  the  first  column  of  Am  has  been  produced 
it  is  fed  in  parallel  (or  sequentially  as  it  is  produced)  to  the 
outer-product  processor  of  Fig.  8,  which  produces  a„Am  in 
parallel.  This  2-D  symmetric  output  in  Fig  8  is  read  out  one 
line  at  a  time  in  parallel,  fn,  is  calculated  from  the  trace  of 
(with  parallel  output  detectors  along  diagonals).  k„,  is 
formed,  then  Pm<  ,  is  assembled  one  row  at  a  time  in 
parallel  in  simple  analog  hardware  and  fed  to  the  input 
point  modulators  Thus  after  one  cycle,  the  necessary  rows 
and  columns  of  [ Am, , -bm.  ,]  are  in  the  AO  ceil  and  the  first 
row  of  the  new  Pn, ,  ,  matrix  is  available  at  the  input  point 
modulators  The  next  cycle  can  thus  begin  immediatek 

After  each  matrix-matrix  multiplication,  one  row  of  Q 
and  R  and  one  element  of  b  is  formed  Data  flow  in  this 
system  is  ideal,  and  full  advantage  is  made  of  the  reduced 
matrix  order  on  each  cycle  Assuming  a  negligible  time  to 
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fig.  9.  Simplified  schematic  of  an  optical  systolic  processor  [79]  to  perform  C,)/?  matrix 
decomp"sition  on  a  general  matrix 
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produce  the  vector  outer  product  and  assemble  each  row  of 
P„t.  (this  is  realistic),  the  system  of  Fig  9  performs  a  QR 
decomposition  in  the  same  {N'/2)TB  time  as  in  (20) 

C  Parallel  Optical  Systolic  Processors  for  the  Solution  of 
Triangular  LAEs  [30] 

In  the  architecture  of  Fig  7,  we  detailed  how  all  of  the 
matrices  and  vectors  associated  with  LU  decomposition 
could  be  calculated  in  parallel,  i.e.,  PA  =  U.  Pb  =  tf.  and 
P  -  L  ' .  ’o  the  conventional  second  and  third  steps  in  a 
direct  LU  decomposition  solution  to  a  system  of  LAEs,  one 
sot.es  Ly  =  6  for  y  and  then  Ux  -  y  for  *  The  latter 
triangular  LAEs  can  easily  be  solved  by  back  or  forward 
substitution  in  dedicated  hardware  and  on  digital  systolic 
processors  Ghosh  and  Casasent  [30]  have  also  noted  that 
such  triangular  systems  can  be  solved  optically  and  have 
d-  tailed  the  solution  for  the  Ly-  b  case  (the  Ux  -=  y  case 
follows  directly  as  noted  in  Section  IV-D)  The  system  of 
f  g  10  achieves  this  and  demonstrates  the  general  algo¬ 
rithm  and  architecture  for  an  optical  systolic  processor  to 
solve  a  triangular  system  of  equations  The  data  flow  for  the 
use  of  a  lower  triangular  system  of  equations  is  described 
We  assume  that  one  row  of  L  and  one  element  of  tf  are 
produced  in  parallel  on  each  cycle  If  these  elements  are 
led  to  the  input  point  modulators  and  a  one-channel  adder 
as  in  Fig  10(b).  the  solution  *  -  L  'tf  is  obtained  sequen¬ 
tially  from  the  system  The  algorithm  used  is  [30] 

Un~  E  C*n)(1/0  (26) 

where  fmm  is  the  associated  diagonal  element  of  L  For  L, 
'ts  diagonal  elements  are  unity  in  LU  decomposition  and 
thus  \/(mm  "  (mm  involves  no  additional  calculations  The 
data  flow  for  this  case  is  shown  in  Fig  10(b)  and  the 
interconnections  between  Figs  7  and  10(b)  are  shown  in 
Fig  10(a)  This  is  the  most  efficient  method  for  a  direct 
solution  of  LAEs  as  we  discuss  in  the  following  section 

D  Parallel  Optical  Svstohc  Processor  for  the  Direct 
Solution  of  LAEs 

Conceptually,  it  is  easiest  to  view  an  LU  decomposition 
solution  of  a  system  of  LAEs  as  a  sequence  of  successive 
matrix-matrix  multiplications  with  the  augmented  matrix 
[4m4  ,]  Such  an  architecture  is  shown  in  Fig  11  for 

the  implementation  of  the  Gauss-|ordan  [41]  algorithm  for 
the  direct  solution  of  a  system  of  LAEs  In  this  algorithm 
and  architecture,  N  matrix-matrix  multiplications  of  Pn,  bv 


(a) 


(b) 


Fig.  10.  Simplified  (a)  and  detailed  -b)  schematic  for  an 
optical  systolic  processor  [30]  to  solve  '.nangular  LAE-. 


Fig.  11.  Simplified  schematic  of  an  optical  processor  tor  the 
direct  solution  of  a  system  of  LAEs  (a  combined  veision  ot 
Figs  10  and  7  or  r  i 


the  augmented  matrix  are  performed  to  convert  A  to  an 
upper  triangular  matrix  U  N  additional  matrix-matrix  mul¬ 
tiplications  are  then  performed  to  diagonalize  the  upper 
triangular  matrix  and  to  properlv  condition  the  augmented 
vector  b  These  last  /V  matrix-matrix  multiplications  actonv 
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plish  (he  back-substitution  algorithm  as  depicted  in  Fig  10 
tn  the  final  augmented  matrix,  A  is  an  identity  matrix  and 
the  final  b  vector  is  the  desired  solution  vector  x  to  the 
original  Ax  -  b  problem  The  detailed  steps  in  this 
Causs-jordan  algorithm  are  detailed  elsewhere  [42],  [44] 
The  required  feedback  and  data  flow  for  such  an  algorithm 
are  shown  in  Fig  11 

In  this  architecture  and  algorithm,  2(  /V  —  1)  successive 
matrix-matrix  multiplications  are  performed  (full  N  x  N 
matrices  are  required  in  this  case)  This  architecture  is 
attractive  because  of  the  simplified  schematic  diagram  that 
results,  as  in  fig  11  However,  this  algorithm  requires  2(  /V 
-  1)  matrix-matrix  multiplications  (with  all  matrices  N  x 
A/)  The  execution  time  for  such  an  algorithm  requires 
(N  -  1)T8  setup  time  plus  (2 N-  2)NJB  of  time  for  the 
matrix  multiplications,  for  a  resultant  calculation  time  of 

(N-  1)TB+(2N’ -  2N)Tbs2N‘Tb  (27) 

Th  is  is  significantly  longer  than  the  time  required  using  a 
combined  version  of  Figs  7  and  10  as  we  now  detail  First, 
we  note  that  L  '  and  b  are  available  as  outputs  in  Fig  7 
We  could  form  l  'b  =  y  (i.e  .  the  solution  y  of  Ly  =  lUx  = 
b)  by  a  simple  matrix-vector  multiplication  However,  the 
system  of  Fig  7  has  already  produced  U 'b  =  b  as  an 
output  and  thus,  by  the  use  of  an  augmented  matrix,  we 
hav»  already  solved  the  first  lower  triangular  system  of 
LAEs  Hence,  if  we  feed  the  stored  U  and  b  data  produced 
from  fig  7  back  to  the  point  modulators  and  AO  cell,  we 
can  compute  x  —  U  'b  directly  in  an  additional  NTB  of 
time  using  the  system  of  Fig  10(b)  and  the  algorithm 

E  l W*n)(1/4n„>)  (28) 

\  n-  ffi  +  l  / 

where  umn  denotes  the  elements  of  U 

Similar  remarks  apply  tor  QR  decomposition  as  achieved 
in  Fig  9  In  this  case,  the  original  Ax  *  b  problem  is 
converted  to  Rx  «  Q’b  -  b,  where  R  and  b  are  outputs 
from  Fig  9  Thus  this  upper  triangular  system  of  equations 
can  be  solved  in  an  additional  NTB  as  noted  above  for  the 
final  x  solution  Thus  in  a  direct  solution  implemented  as 
above  with  augmented  matrices,  the  size  of  the  matrix  is 
reduced  by  one  on  each  cycle,  and  V  or  R  and  the  new  b 
are  stored  Then,  U  or  R  are  fed  to  the  point  modulators 
and  b  to  a  serial  adder  (as  in  Fig  10),  and  in  an  additional 
NTb  after  LU  decomposition  the  final  solution  is  produced 
The  total  time  for  the  LU  or  QR  versions  of  such  an 
algorithm  is  thus  the  time  in  (20)  plus  MTS  or 

(N-’/2)Te+  NTes  (N:/2)Te  (29) 

for  N  large  This  is  one-fourth  the  time  required  in  (27),  and 
storage  is  a  factor  of  N  less  Propagation  of  the  full  matrix 
data  can  introduce  and  accumulate  additional  errors,  since 
the  many  0  and  1  matrix  elements  in  such  a  full  matrix 
propagation  may  not  be  identically  0  and  1.  Thus  it  is 
preferable  to  use  the  architecture  of  Fig  11  as  in  Figs  7  or  9 
and  on  the  last  cycle  to  feed  U  or  R  to  the  input  point 
modulators  Such  an  architecture  will  thus  achieve  a  full 
direct  LU  or  QR  solution  to  a  system  of  N  LAEs  in  N:Te/2 
of  time 

In  closing  this  section,  we  note  that  most  of  the  proposed 
optical  systolic  processors  (Section  It)  can  implement  the 
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various  algorithms  (Figs  5-11)  with  different  degrees  0t 
parallelism  and  with  associated  detector  (onipl<-xitv  (de¬ 
pending  upon  the  precise  architecture)  Spenfu  i-.sues  ol 
the  detector  system  readout  and  the  data  and  up>-rational 
flow  must  be  detailed  (or  each  system  to  asst the  best 
architecture  to  use  for  a  given  application  and  operation 

V  ExrtNSIONSOI  Basic  ALGORITHMS  ANU  OetK  A  I  i  INS 

The  matrix- vector,  matrix-matrix,  and  triple-matrix  multi¬ 
plication  operations  (Section  II),  plus  the  matrix  dm  'imposi¬ 
tion  methods  (Section  IV)  and  the  techniques  tor  solving 
EAEs  (Sections  III  and  IV)  represent  the  basic  op  rations 
required  in  advanced  linear  algebra  processors  Sp'  iser  [5] 
has  defined  the  fundamental  needs  of  a  modern  e  :  I  - 1 1  me 
matrix  or  systolic  processoc  to  include  the  above  op>  rut  ions 
plus  the  solution  of  eigensystems.  singular  value  d'-com- 
position  (SVD),  and  least  squares  solution  In  this  s.  <  tion, 
these  latter  operations,  other  matrix  decomposition  algo¬ 
rithms,  and  various  extensions  of  the  basic  operations  are 
briefly  discussed  Emphasis  is  given  to  those  method-  that 
are  most  suitable  for  parallel  implementation  on  optical 
systolic  processors,  to  those  methods  for  which  an  optical 
systolic  processor  implementation  has  thus  far  been 
detailed,  and  to  the  most  stable  and  preferred  algorithms 
For  reasons  of  space,  all  of  these  discussions  must  be  quite 
brief  The  complexity  of  different  problems  and  operations 
is  usually  described  by  the  number  of  multiplications  and 
additions  required  for  problems  of  size  N  The  problem-,  of 
me, -r  concern  are  of  order  Nl.  In  Table  2.  problem-  of 
different  order,  the  name  given  to  each,  and  examples 
of  each  are  provided 


Table  2  Complexity  Measures  tor  Different  Pioblems 
(Adapred  from  [5]) 


Order 

Name 

Examples 

N 

Scalar 

Inner  Product.  HR  filter 

N~ 

Vector 

Linear  Transforms,  Fourier  Transform 
Convolution,  Correlation, 
Matrix-Vector  Products 

N 5 

Matrix 

Matrix-Matrix  Products,  Matrix 
Decomposition.  Solutions  of  Eigen 
Systems  or  LAEs  or  Leas)  Squares 
Problems 

'n  many  cases,  the  system  of  LAEs  is  overdetermined  (i.e 
there  are  more  equations  than  unknowns)  In  this  case 
Ax  «  b  is  a  matrix  with  M  rows  and  /V  columns,  where 
M  >  N.  The  conventional  least  squares  solution  to  mini 
mize  [] A*  -  6||'  results  in  the  classical  Gauss-normal  equa 
lion  AtAx  -  A’b  One  can  solve  this  as  LAEs  where  the 
matrix  is  A7A  (a  square  matrix)  and  the  vector  is  A’b  This  is 
not  attractive  since  the  new  matrix  has  a  condition  number 
that  is  the  square  of  the  one  in  the  original  problem  This 
will  significantly  increase  he  effect  of  any  computational 
errors  Modprn  signal  processing  solutions  to  least  squares 
problems  employ  matrix  decomposition  by  LU.  QR,  and 
SVD  methods.  If  a  QR  decomposition  is  performed  (A  - 
QR)  the  Gauss-normal  equa'ions  become  x  =  (A'A)  A’b 
or  Rx  »  Q’b  “  b,  and  thus  from  the  R  matrix  and  the  b 
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vector  (produced  for  example  in  Fig  9).  only  the  solution  of 
an  upper  triangular  system  is  required  (this  can  be  arhieved 
0n  the  system  of  fig  10  as  detailed  in  fig  11)  Similar 
remark^  apply  to  an  LU  decomposition  In  adaptive  beam 
forming  (Section  VI).  calculation  of  the  adaptive  weights 
can  hi  formulated  as  a  constrained  least  squares  problem 
In  solving  this,  the  constraints  are  first  removed  and  a 
conventional  least  squares  problem  results  which  can  be 
solved  as  above  Such  advanced  adaptive  noise  cancellation 
algorithms  using  direct  least  squares  techniques  are  attrac¬ 
tive  since  thev  provide1  better  convergence  than  the  gradi- 
ent  bjsed  algorithm  noted  in  Section  VI  They  require  the 
matrix  decomposition  algorithms  described  in  Section  IV 
and  t'fTovv 

Solutions  of  eigensystems  is  a  second  rnaior  problem  that 
arises  in  modern  signal  processing  algorithms  The  newest 
beam-forming  and  direction-finding  algorithms  for  high- 
resolution  performance  require  the  solution  of  a  symmetric 
eig1  nsvstem  for  each  resolved  temporal  frequency  [45)  for  * 
and  A  The  most  popular  algorithms  for  eigc  nsvstem  solu¬ 
tions  involve  the  lacobian  method  [46],  SVD  [67],  House¬ 
holder  or  Givens  transformations  [41]  (to  calculate  selected 
eigenvalues),  and  the  QR  algorithm  [41]  In  the  QR  algo¬ 
rithm,  similarity  transformations  are  applied  ( A  is  trans¬ 
formed  into  B  =  T  AT,  where  A  and  B  have  the  same 
eigenvalues  and  the  eigenvectors  y  of  B  are  related  to  the 
eigenvectors  *  of  A  by  Ty  =  x)  Using  QR  decomposition, 
the  matrix  Q  is  calculated  such  that  Q'AQ  =  D  (where  D 
is  approximately  diagonal,  with  small  off-diagonal  ele¬ 
ments)  This  is  achieved  by  successive  matrix  decomposi¬ 
tions  and  matrix  multiplications,  i.e  ,  at  step  m  we  decom- 
pos"  A„,  =  Qn,R.„  and  form  a  new  matrix  A,„.  ,  =  RmQ„, 
*=  Q’„  Ar,Q,„  This  procedure  is  repeated  recursively  until 
Q  A„,Q„,  approximately  diagonal  The  final  matrix  is 
Q=QQ  The  high  accuracy  achieved  with  such 

orthogonal  transformations  makes  their  general  use  most 
attractive 

The  optical  realization  of  QR  solutions  to  eigensystems 
ha'  been  detailed  [29]  and  follows  directly  from  Fig  9  and 
th>  above  steps  Shift  algorithms  [41]  can  be  used  to  greatly 
reduce  the  number  of  matrix  multiplications  needed  If  the 
mj’ux  A  is  full  and  not  symmetric,  similarity  transforms  can 
on',  reduce  A  to  a  Hessenberg  matrix  (an  upper  triangular 
ma'rix  with  one  additional  diagonal  below  the  mam  diago¬ 
nal)  Standard  decomposition  methods  exist  to  reduce  Hes- 
seelierg  matrices  to  tridiagonal  form  and  symmetric  matrices 
to  tndiagonal  (using  the  QR  algorithm)  or  bidiagonal  (using 
SkD)  matrices  The  optical  realization  of  several  of  these 
mi  thods  have  been  detailed  [29],  [47]  The  preferable  solu 
fto'is  appear  to  be  to  use  QR  techniques  to  reduce  A  to 
in d  agonal  form  or  one-sided  SVD  techniques  to  reduce  A 
to  t'idiagonal  form,  and  to  then  calculate  the  eigenvalues  of 
these  simplified  matrices  Such  methods  are  the  sub|ect  of 
active  current  research  [29]  |46]-[50)  Such  eigensvstem 
solutions  are  preferable  to  power  methods  for  which  several 
optical  realizations  [ 5 1 ]  [52]  have  been  described  These 
rep-.'sent  other  fruitful  areas  for  future  optical  systolic 
prex  essor  research 

let  us  next  advance  several  remarks  on  other  triangular 
and  orthogonal  matrix  decompositions  and  then  briefly 
discuss  SVD  Many  triangular  factorization  techniques  are 
pos-uble  besides  LU  decomposition  These  include  LDU, 


LDL',  and  LI ’  (optical  realizations  of  each  of  these  are  quite 
straightforward)  LU  and  LDU  decomposition  require  that 
A  be  only  nonsingular  If  A  is  symmetric  and  positive-defi¬ 
nite  (as  often  occurs  in  signal  processing),  then  LDl'  and 
Ll'  decomposition  are  quite  attractive  since  they  avoid  the 
need  for  pivoting  in  the  calculations  required  to  compute 
the  new  Pm.  The  optical  realization  of  Cholesky  (tl1  or 
LDL1)  decomposition  [53]  has  been  detailed  [28]  and  fol¬ 
lows  directly  from  fig  7.  LDL1  decomposition  is  the  prefer¬ 
able  choice  in  such  cases  since  it  avoids  the  need  to  form 
the  square  root,  as  required  in  LL1  decomposition  Orthogo¬ 
nal  matrix  factorizations  are  preferable  for  general  matrices 
since  they  are  numerically  stable,  since  there  is  no  need  for 
pivoting  (as  can  be  required  in  LU  decomposition  of  gen¬ 
eral  matrices),  and  because  fast  shifted  QR  algorithms  (for 
eigensystem  solution)  exist 

As  noted  in  Section  IV,  numerically  stable  QR  de¬ 
composition  can  also  be  achieved  by  modified  Gramm- 
Schmidt  [53]  and  Givens  techniques  [53]  The  optical  realiza¬ 
tion  of  these  methods  can  be  directly  realized  bv  a  se¬ 
quence  of  matrix  multiplications  with  calculations  of  the 
elements  of  the  next  transformation  matrix  required  after 
each  matrix  product  is  formed  The  Householder  technique 
appears  to  be  the  most  parallel,  stable,  and  easiest  algo¬ 
rithm  to  realize  optically  as  quantified  in  Section  IV-D  For 
digital  systolic  processors,  Givens  techniques  are  presently 
the  most  popular  and  attractive  ones  (this  is  due  to  the 
architectural  differences  between  most  digital  and  optical 
systolic  processors) 

As  noted  earlier,  SVD  is  a  powerful  and  useful  technique 
for  least  squares,  eigensystems,  and  high-resolution  direc¬ 
tion-finding  problems  Although  this  is  a  complicated 
decomposition  algorithm,  it  provides  estimates  of  the  con¬ 
dition  number  of  the  matrix  and  the  number  of  signals 
present  In  SVD.  the  matrix  A  is  factored  into  three  matrices, 
A  *»  PDQ'.  where  P  and  Q  are  orthogonal  matrices  and  D 
is  a  diagonal  matrix  The  singular  values  of  A  are  the 
elements  of  D  When  applied  to  a  least  squares  problem, 
the  SVD  solution  *  *=  QD  Prb  is  easily  calculated  once 
the  SVD  has  been  performed  Thus  far,  the  only  optical 
realization  of  SVD  described  [47]  used  2-D  modulators  in  an 
optical  matrix-vector  processor  Extension  to  optical  sys¬ 
tolic  architectures  appears  to  be  rather  straightforward 

A  review  of  the  wealth  of  linear  algebra  algorithms  in 
modern  signal  processing  is  beyond  the  scope  of  this  paper 
However,  the  selected  algorithms  noted  above  and  the 
selected  applications  discussed  in  Section  VI  provide  a 
good  introduction  and  overview  to  the  role  of  optical 
systolic  processors  in  modern  signal  processing  A  summary 
of  attractive  solutions  for  various  mathematical  problems  is 
provided  in  Table  3  This  table  is  the  compilation  of  many 
references  [5],  [26],  [27],  [41],  153]  The  matrix  solution*  given 
are  by  no  means  complete  and  represent  what  prr  en; 
appear  to  be  the  better  approaches  for  optical  s'  s,  ;5c 
processors  Several  general  signal  processing  apoiic  oi  nns 
are  then  noted  in  Table  4  For  each,  an  attract i v  -  pe  ulom 
formulation  and  one  candidate  solution  are  v  >ied  As  be 
fore  various  other  problem  formulations  a'd  candidate 
solutions  are  possible  beyond  those  listed  Tab  ^4  How¬ 
ever,  the  methods  listed  appear  at  present  i  -  c'1  among  the 
most  attractive  ones  for  realization  on  ootical  systolic 
processors 
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Table  3  Attractive  Solutions  for  Various  Problems 


Problem 

Matrix  Features 

Attractive  Solution 

Solution  of  Time- 
Dependent  PDEs  [29] 

Banded 

Finite  Differences 

Implicit  or  Explicit 

Deconvolution  [  TO] 
(Siationary) 

Toeplitz 

Time  or  Spare  Integrating 
Processors  with  feedback 

Solution  of  Systems 
of  LAEs 

At  -  b 

None 

Diagonally  Dominant 

Stable 

Symmetric  Positive  Definite 

Direct  by  QR 

Direct  by  LU 

Indirect 

Direct  by  Cholesky 

Eigensystem  Solution 

Ax  -  Ax 

[41].  [53] 

Real-Symmetric  or 

Complex  Hermitian 
Symmetric 

Symmetric  Nonnegative 
Definite 

Householder  Direct 
Decomposition 

Iterative  QR  Algorithm 
Direct  SVD  Reduction 
to  Bidiagonal 

Symmetric  Generalized 
Eigensystem  Solutions 
Ax  -  A8x  [45] 

A  and  B  are  Real  Symmetric 

B  is  Positive-Definite 

Unitary  Transforms 
or  LL' 

least  Squares  Solutions 

II Ax  -  b\\2 

R  -  a’a  is  Positive- 
Definite 

A7A  is  Nonsingular 

A’A  is  Singular 

Direct  Matrix 

Decomposition 

QR  Decomposition 

SVD 

Table  4  One  Possible  Problem  Formulation  and  Solution  for  Selected  Specific  Applications 


Application 

Attractive 

Problem 

Formulation 

Candidate  Solutions 

Reference 

High  Resolution 
Direction  Finding 

Symmetric  Eigensystem 

SVD 

[45] 

State  Estimation 

Kalman  filter 

Recursive  Least  Squares 
(Square-Root  Formulation) 

[66] 

Adaptive  Noise 
Cancellation 

Constrained  least  Squares 

Triangular  or 

Orthogonal  Decomposition 

[37] 

VI  Seifctcd  Applications  for  Optical  Systolic 
Processors 

A  wealth  of  physical,  signal  processing,  and  control  prob¬ 
lems  require  various  linear  algebra  operations  and  the  solu¬ 
tions  of  diverse  matrix  equations  Brief  discussions  of  several 
applications  are  now  advanced  These  are  drawn  from  avail¬ 
able  optical  systolic  processing  literature  and  are  chosen 
and  intended  to  demonstrate  different  points  and  features 
1)  solutions  of  partial  differential  equations  (PDEs)  with 
emphasis  on  matrix  structure  and  implicit  and  explicit  solu¬ 
tions  (Section  IV-A),  2)  radar  and  sonar  applications  with 
attention  to  simple  adaptive  filtering  and  the  need  to  han¬ 
dle  complex-valued  data  (Section  Vl-B),  and  3)  optimal 
control  with  attention  to  the  solution  of  a  nonlinear  matrix 
equation  on  an  optical  systolic  processor  (Section  Vl-C), 

A  Solution  of  PDEs 

PDEs  are  the  standard  mathematical  models  for  many 
physical  problems  and  distributed  systems  in  applied  mech¬ 
anics  For  steady-state  PDEs  (e  g  ,  elliptical  equations),  spa¬ 
tial  discretization  leads  directly  to  EAEs  which  can  be  solved 
by  the  indirect  (Section  III)  or  direct  (Section  IV)  solution 
methods  noted  earlier  Time-dependent  PDEs  represent 
another  ma|or  class  of  mathematical  models  Discretization 
of  such  equations  can  yield  implicit  or  explicit  solutions  as 
we  now  demonstrate  [29] 
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We  consider  the  diffusion  equation  as  a  second-order 
PDE  example 

du(x,t)/dt  -  C2d}u(x,t)/dx'  (30) 

to  be  solved  for  u(x,F)  with  boundary  conditions  u(x, 0)  = 
f(x)  for  0  <  x  «  L  and  u(0,  t)  -  u(L,t)  -  g(t)  for  0  <  t  ^  7 
We  discretize  time  and  space  into  increments  At  -  T / 
( /V  +  1)  and  Ax  -  (/(/  +  1),  and  denote  discrete  points  in 
time  and  space  by  nAf  and  /Ax.  If  we  apply  single  dif¬ 
ferencing  in  time  and  space  to  both  sides  of  (30),  we  obtain 


where  superscripts  denote  time  increments  and  subscripts 
denote  space  increments  Rearranging  (31),  we  obtain 

u;*'  -  Au"  ,  +(1  -  2A)u;  +  Au/’.,, 

for  n  £  0, 1  «  /  $  /  (32) 

where  A  -  c?Af/(Ax)i‘.  An  alternate  formulation  results  if 
we  apply  double  differencing  to  the  space  derivative  on  the 
right-hand  side  of  (30)  In  this  case,  we  obtain 

(1  +  2A)u;*’  -  Au,"/,1  -  Au/lV 

-Au",  +(i  -  2A)u,"  + Au/' ,  (33) 

Let  us  now  consider  and  compare  the  use  of  (32)  or  (33) 
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10  vohe  for  u(  x,f)  as  u,'"  (at  f  -  lAf  for  all  1  <  /  <  /).  then 
u<-''  (at  t  -  2Af  for  all  /),  etc  From  (32).  calculation  of 
o1"'  1  for  all  /  requires  a  simple  matrix-vector  multiplica¬ 
tion  u'"‘  ’  “  Au‘"‘.  where  A  is  tridiagonal  with  elements 
A,  (I  -  2A)  and  X  along  the  three  diagonals  (where  u'"'  is 
knoun  from  boundary  conditions  or  from  the  calculations 
at  the  prior  t  -  n&t  time  step)  However,  for  the  single-dif¬ 
ferencing  approximation  to  be  a  good  approximation,  Ax 
must  be  small,  and  for  stability  0  <  A  «  0.5  is  necessary,  and 
thus  a  large  number  of  very  small  time  steps  Ar  are  needed 
to  produce  accurate  results  Hence,  such  explicit  solutions 
(which  initially  look  quite  attractive  because  they  require  a 
matrix-vector  multiplication  to  obtain  the  data  at  the  next 
time  step)  can  in  practice,  require  many  small  Ax  sample's 
and  man>  small  Ar  time  steps,  and  thus  a  significant  num¬ 
ber  of  matrix  multiplications 

Let  us  next  consider  the  Crank-Nicholson  algorithm  as 
formulated  in  (33)  In  this  case,  u""  "  is  calculated  from 
u"".  which  is  known  from  boundary  conditions  or  calcula¬ 
tions  at  the  prior  t  *>  nAf  time  step  try  solving  the  LAEs 
Auu"  1  •=  b'"'  Thus  the  explicit  solution  in  (32)  requires 
only  a  matrix-vector  multiplication,  whereas  the  implicit 
solution  in  (33)  requires  the  solution  of  a  system  of  LAEs  at 
i-ach  time  step  An  implicit  solution  is  still  attractive  be¬ 
cause  it  is  unconditionally  stable  and  has  second-order 
accuracy,  and  because  the  number  of  matrix  multiplications 
in  the  explicit  solution  may  be  quite  large  Since  the  matrix 
is  tridiagonal,  solutions  to  such  LAEs  can  become  quite 
simple  using,  for  example,  the  system  of  Fig  2  In  many 
cases,  the  coefficient  c  in  (30)  is  constant  or  is  slowly 
carving  with  time,  and  hence  so  is  the  matrix  A  In  such 
cases,  matrix  decomposition  or  direct  solutions  are  quite 
attractive,  since  the  matrix  decomposition  need  be  per¬ 
formed  only  once,  thereafter  the  simplified  triangular  sys¬ 
tems  solution  can  be  used  with  a  different  exogenous 
vector  This  would  require  only  a  matrix-vector  multiplica¬ 
tion  at  each  time  step,  as  in  the  explicit  algorithm  in  (32) 

Trapezoidal.  Runge-Kutta,  and  other  difference  ap¬ 
proximations  in  space  and  time  are  also  possible  These  will 
yield  different  forms  for  (32)  and  (33),  but  with  similar 
matrix  structures  and  conclusions  In  general,  all  discretiz¬ 
ing  methods  will  yield  implicit  or  explicit  solutions  with 
banded  matrices  Thus  many  physical  problems  directly 
result  in  matrix-vector  problems  with  quite  structured 
matrices  Deconvolution  applications  [10]  are  yet  another 
case  when  structured  matrices  result  In  such  cases,  the 
received  signal  s(f)  is  the  convolution  of  the  original  signal 
a(f)  with  the  impulse  response  b(f)  of  the  transmitting 
medium  In  terms  of  discrete  signal  samples,  c„,  -  Lhm_  „a„ 
(where  the  summation  is  over  the  range  of  sample  points) 
The  matrix-vector  form  is  c  -  Ha  (where  the  matrix  A  is 
Toeplitz  and  contains  elements  hm  n)  Thus  to  recover  a 
given  h  and  c  requires  the  solution  of  a  system  of  LAEs  with 
a  Toeplitz  matrix  The  Toeplitz  matrix  structure  will  exist  for 
linear  shift-invariant  distortions,  and  its  bandwidth  will  de¬ 
pend  upon  the  length  of  the  impulse  response  function  In 
such  cases,  the  architectures  of  Figs  2  and  3  with  ap¬ 
propriate  feedback  as  in  Figs  5  or  6  can  be  employed  A 
variety  of  applications  thus  exist  for  structured  matrix  and 
LAE  solutions  The  best  solution,  algorithm  and  architec¬ 
ture  depend  upon  the  specific  problem  and  application 
However,  implementation  methods  for  the  basic  algorithms 
and  architectures  have  been  described  (Sections  II— IV') 


B  Simplified  Auaptive  None  Cancellation 

One  of  the  original  motivations  and  applications  for 
optical  matrix-vector  processors  was  adaptive  phased-arrav 
radar  processing  [54]  This  application  [55]  introduced  the 
original  iterative  optical  matrix-vector  algorithm  in  (fc.)  We 
briefly  consider  the  calculations  required  to  obtain  the  set 
of  adaptive  weights  w  for  an  adaptive  phased  array  to  steer 
the  antenna  in  a  direction  defined  by  the  vector  s  and  to 
null  the  noise  field  defined  by  the  covariance  matrix  AT 
This  problem  is  the  basis  for  much  of  radar  and  sonar  beam 
forming  In  the  simple  case  of  a  linear  array  of  N  evenly 
spaced  antenna  elements,  the  received  signal  v;,(  r )  at  an¬ 
tenna  element  n  is  multiplied  by  an  appropriate  weight  u, 
For  the  full  antenna,  the  output  signal  for  one  set  of 
weights  is 

M 

s(f)=  I  w„v„(f) 

n—  1 

The  weights  defined  by  the  vector  w  are  chosen  to  control 
the  antenna  beam  pattern  E(9),  where  6  is  the  angle  at 
which  the  beam  is  steered  In  general,  w  is  complex-valued 
and  varies  with  time  in  conjunction  with  the  noise  environ¬ 
ment  w  is  chosen  to  null  the  noise  sources  within  the 
antenna's  field  of  view  at  the  desired  angles  and  frequen¬ 
cies,  and  to  produce  a  peak  at  the  desired  steering  direc¬ 
tion,  Hence,  a  least  squares  formulation  is  appropriate  as 
noted  in  Section  V 

Various  adaptive  control  loops  are  possible  to  achieve 
this  Their  operation  and  convergence  are  detailed  else¬ 
where  [54]  In  vector  notation,  the  dynamic  behavior  of  one 
type  of  adaptive  control  loop  and  weight  vector  is  de¬ 
scribed  by 

t w/C +(M  +  lC)w  -  s*  (34) 

where  t  is  the  time  constant  of  a  low-pass  filter  and  C  is 
the  gain  within  the  control  loop  The  covariance  matrix  has 
elements  m  —  (M*(f)v',(f)).  where  (— )  denotes  a  time 
average  Assuming  G  »  0,  then  in  steady  state  ( vv  =  0), 
(34)  reduces  to  Mw  -  s*.  and  the  set  of  adaptive  weights  w 
is  given  by  the  solution  w  »  AT  '  's*  to  a  set  of  LAEs  The 
various  algorithms  described  in  Sections  III  and  IV  and  the 
various  optical  systolic  architectures  in  Section  II  are  suit 
able  for  solution  to  such  problems  Least  squares,  SVD.  and 
eigensystem  solutions  (Section  V)  are  the  most  attractive 
present  methods  for  such  beam-steering  problems  using 
new  algorithms  noted  in  Section  V. 

Matrix  inversions  will  arise  in  various  applications  At  thu 
point,  we  note  that  the  architecture  and  algorithm  depicted 
in  Figs  9  or  11  can  perform  matrix  inversion  if  the  aug¬ 
mented  vector  b  is  replaced  by  the  identity  matrix  I  of 
order  equal  to  that  of  A  In  this  case,  the  system  solves  the 
matrix-matrix  equation  BA  -  I  for  B-A  '  by  a  direct 
algorithm  A  parallel  iterative  algorithm  [13]  for  matrix  inver¬ 
sion  on  the  system  of  Fig  6  is  also  possible  by  modifying 
the  algorithm  in  (6)  To  develop  this  algorithm,  we  consider 
the  solution  of  a  general  matrix-matrix  equation  C  •  AB 
for  B  -  A  'C  by  a  new  indirect  method  The  conventional 
iterative  algorithm  is  rewriten  as 

B,*,  -  u(AB,  +  C)  (35) 

where  A’  -  ( l/u  -  A)  Calculation  of  A’  is  trivial  and  (35)  is 
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considerably  simpler  to  implement  than  the  conventional 
matrix  extension  of  (6) 

Extensions  of  adaptive  phased-array  radar  processing  to 
provide  multidimensional  adaptation  in  velocity  or  time  as 
well  as  angle  or  space  have  also  been  described  [56]  The 
matrix  vectorizing  methods  discussed  in  Section  Vl-C  are 
quite  useful  in  such  extensions.  Calculation  of  antenna 
weights  also  introduces  the  issue  that  the  vector  will  have 
complex-valued  elements.  Various  methods  for  handling 
matrices  and  vectors  with  bipolar-valued  and  complex-val¬ 
ued  elements  have  been  described  These  include  space, 
time,  frequency,  or  wavelength  multiplexing  The  most 
popular  general  methods  for  representing  complex-valued 
data  are  by  a  four-tuple  representation  (the  positive-  and 
negative-valued  real  and  imaginary  parts)  or  a  three-tuple 
representation  (each  complex  number  is  represented  by  its 
three  projections  on  the  0°,  120°,  and  240°  axes  in  the 
complex  plane) 

Other  classical  signal  processing  operations  can  also  be 
described  as  matrix  operations,  some  as  vector  outer  prod¬ 
ucts  rather  than  matrix-vector  multiplications.  In  signal 
processing  applications,  the  matrix  used  is  generally  the 
covariance  matrix  It  will  be  real,  symmetric,  and  nonnega¬ 
tive  definite  (for  real  random  vectors)  or  Hermitian  symmet¬ 
ric  and  nonnegative  definite  (for  complex  vectors).  Hence, 
different  matrix  properties  will  result  in  different  appli¬ 
cations,  and  appropriate  decomposition  algorithms  utilizing 
the  matrix  features  should  be  employed.  Calculation  of  the 
ambiguity  function  [57]  of  two  signals  is  a  classical  signal 
processing  operation  In  its  discrete  form,  it  can  be  de¬ 
scribed  [44]  as  the  product  of  three  matrices  (one  matrix 
being  Toeplitz  and  another  being  diagonal)  The  required 
cross-ambiguity  function  can  thus  be  calculated  on  a  matrix 
processor  by  performing  the  indicated  triple-matrix  product. 
Detailed  algorithms  and  optical  systolic  processors  for  these 
and  other  advanced  signal  processing  functions  described 
here  and  in  Section  V  are  the  subject  of  current  research. 

C.  Optimal  Control,  State  Estimation,  and  Kalman  Filtering 

State  estimation  and  Kalman  filtering  applications  are 
among  the  most  demanding  ones  for  which  advanced  highly 
parallel  optical  systolic  processors  with  very  high  computa¬ 
tional  rates  are  needed  The  basic  operations  required  in 
Kalman  filtering  are  well  known  [58]  They  include  triple¬ 
matrix  products,  matrix  inversions,  and  the  solutions  of 
nonlinear  matrix  equations  [13].  Algorithms  and  architec¬ 
tures  to  achieve  all  of  these  operations  (except  the  last  one) 
have  been  described  earlier  in  this  paper.  We  thus  now 
advance  a  new  algorithm  for  solving  nonlinear  matrix  equa¬ 
tions  on  optical  systolic  processors.  This  will  enable  our 
repertoire  of  operations  achievable  on  an  optical  systolic 
processor  to  include  all  of  the  basic  operations  needed  for 
state  estimation  and  Kalman  filtering 

For  the  case  when  the  noise  statistics  are  known,  a  simple 
two-channel  optical  systolic  processor  design  has  been  ad¬ 
vanced  for  steady-state  Kalman  filter  computations  [59],  In 
the  more  important  case  of  a  fully  adaptive  Kalman  filter, 
the  sequence  of  operations  necessary  and  the  required 
processor  architecture,  as  well  as  the  flow  of  operations,  is 
far  more  complicated  The  full  solution  to  this  problem  for 
an  extended  Kalman  filter  has  been  detailed  [60],  [61]  for 
an  air-to-air  missile  guidance  controller  In  this  case,  a 


Newton-Raphson  solution  was  employed  to  solve  the  non¬ 
linear  matrix  equation,  and  calculation  of  the  jacobian  was 
achieved  by  an  efficient  digital  table  lookup  method  or  by  a 
new  optical  systolic  processor  as  described  in  the  refer¬ 
ences  noted  A  more  general  optical  systolic  method  to 
solve  nonlinear  quadratic  matrix  equations  is  described  be¬ 
low  [36],  [62],  [63] 

The  specific  application  considered  is  the  solution  of  a 
linear  quadratic  regulator  (LQR)  problem  of  modern  control 
engineering,  in  which  the  control  signals  u(t)  that  mini¬ 
mize  a  quadratic  cost-performance  index  for  the  general 
linear  system  model 

dx/dt  -  Fx(t)  +  Cu(t)  (36)  | 

are  desired.  The  solution  is 

u(t)--Kx(t)  (37) 

where  the  LQR  feedback  gain  matrix  K  is  computed  as 

K-R'C’S  (38) 

and  the  symmetric  matrix  S  is  the  solution  of  the  algebraic 
Ricatti  equation  (ARE) 

SF  +  F'S  -  SLS  +  Q  -  0  (39)  I 

where  L  -  GR~'GT.  Selection  of  this  application  was  moti¬ 
vated  by  the  availability  of  all  of  the  necessary  matrices  for 
the  F100  turbofan  jet  engine,  thus  allowing  specific  quanti-  ; 
tative  data  to  be  obtained  and  used.  The  key  step  in  the 

calculation  of  u(t)  in  (37)  is  solving  the  quadratic  matrix  ■ 

equation  (39)  for  S  Hence,  we  concentrate  on  one  solution  I 
method  recently  developed  [63],  This  will  result  in  an  opti- 
cal  systolic  system  realization  of  earlier  algorithms  [36],  [62] 
devised  for  an  optical  matrix-vector  processor  using  a  2-D 
light  modulator. 

The  solution  S  to  (39)  is  devised  beginning  from  the  ! 
classical  Newton-Raphson  algorithm.  Substituting  the  ARE  i 
into  the  Newton-Raphson  solution,  the  iterative  algorithm 

S(k)F(k)  +  FT(k)S(k)  -  -S(k  -  1)tS(*  -  1)  +  Q  I 

(40) 

results,  where  k  denotes  the  iteration  index  and  where 

F(k)  m  F  —  LS(k  -  1).  (41)  * 

This  is  known  as  the  Kleinman  algorithm  [64],  Noting  that  * 
the  right-hand  side  of  (40)  is  known  from  the  value  S(k  -  1)  : 

at  the  prior  (k  -  1)  iteration,  we  see  that  (40)  is  linear  in  S  1 

the  Kleinman  algorithm  has  thus  converted  the  nonlinear 
quadratic  matrix  equation  in  (39)  intc  ‘he  linear  equation  in  i 

S  in  (40).  We  also  note  that  (40)  has  the  form  of  the  I 

Lyapanov  equation  and  that  solutions  to  this  equation  using 
the  Kronecker  or  tensor  product  and  vectorization  exist  [65] 

To  convert  (40)  to  LAEs,  we  vectorize  the  matrix  on  the 
right-hand  side  of  (40)  by  lexographically  ordering  the  ma¬ 
trix  elements  The  resultant  column  vector  is  denoted  by  | 
V(k)  The  vectorized  column  vector  associated  with  5(A)  is  ! 
denoted  by  x(k)  Equation  (40)  can  now  be  described  by 
the  system  of  LAEs 

H(k)x(k)-y(k)  (42)  . 

where  H(k )  is  a  matrix  with  specific  block  structure  as 
detailed  elsewhere. 

The  steps  in  solving  for  the  matrix  5  in  the  form  of  the 
vector  x  in  (42)  thus  involve  at  step  k  a)  evaluation  of  F(k) 
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in  (41).  the  right-hand  side  y( k)  of  (40),  and  (ormattingot  H 
in  (40),  and  b)  the  solution  of  the  LAEs  in  (42)  for  x(k).  The 
steps  in  a)  involve  simple  matrix-vector  and  matrix-matrix 
multiplications  For  the  steps  in  b),  indirect  or  direct  algo¬ 
rithms  (Sections  III  and  IV)  can  be  used  For  the  case  when 
an  indirect  solution  to  (42)  is  used,  the  solution  x  (or  5  in 
(39)  can  be  described  by  the  two-loop  iterative  algorithm 

x(r  +  1,  k)  =  [  F  -  u(k)H(k)]*(r,  k)  +  u(k)y{k) 

(43) 

For  a  fixed  k,  (40)  is  the  Richardson  algorithm  in  (6)  used  to 
sohe  (39)  for  x(k)  We  use  r  to  denote  iterations  in  the 
Ric  hardson  algorithm  solution  to  the  LAts  in  (42)  H(  k )  and 
y(k)  are  then  updated  and  new  LAEs  in  (42)  are  obtained 
We  denote  the  iterations  of  the  Kleinman  algorithm  by  the 
index  k  The  solution  described  to  a  quadratic  matrix  equa¬ 
tion  thus  employs  an  inner  iterative  loop  (implementing  the 
Ruhardson  algorithm  for  solutions  to  LAEs)  and  an  outer 
loop  (implementing  the  Kleinman  algorithm,  to  update  the 
LAEs)  These  iterations  continue  until  the  solution  x(k  +  1) 
=  x(k)=  x  is  obtained  Direct  LAE  solutions  to  (42)  and 
other  solutions  (more  complicated  to  formulate)  to  (40)  are 
of  course  possible.  The  algorithm  described  above  is  one 
example  of  the  class  of  operations  and  algorithms  possible 
on  optical  systolic  processors  for  advanced  signal  process¬ 
ing  applications. 

vil  Discussion,  Summary  and  Conclusions 

The  accuracy  and  performance  of  any  optical  processor  is 
always  an  issue  of  concern  If  the  performance  of  the 
analog  architectures  described  is  not  sufficient,  they  can  be 
extended  to  digital-optical  architectures  as  noted  earlmr 
and  elsewhere  In  v.'1  instances,  the  error  source  modeling 
and  performance  measures  used  merit  attention  The  con¬ 
ventional  roundoff  error  analysis  available  for  many  digital 
linear  algebra  algorithms  is  not  appropriate  for  optical  sys¬ 
tolic  processors  whose  errors  (such  as  spatial  nonumformi- 
ties  in  the  input,  AO  cell  and  detector  planes,  plus  detector 
noise)  are  considerably  different  in  nature  Initial  modeling 
of  such  error  sources  in  optical  processors  has  been  accom¬ 
plished  [25],  and  the  results  are  applicable  to  both  analog  or 
digital  optical  systolic  processors  The  appropriate  perfor¬ 
mance  measure  used  is  also  of  concern  This  will  depend 
on  the  purpose  of  the  processor  For  general  systolic  array 
processors,  the  average  or  maximum  error  in  any  one  ele¬ 
ment  of  the  computed  matrix  or  vector  is  one  performance 
measure  In  specific  applications,  different  performance 
measures  can  be  defined  For  the  LQR  example  defined  in 
Section  VI  C,  the  accuracy  of  the  closed-loop  poles  of  a 
controlled  system  matrix  is  an  appropriate  performance 
measure  (since  these  poles  describe  the  transient  response 
of  the  closed-loop  system)  In  some  cases,  such  as  many 
adaptive  noise  cancellation  applications,  the  set  of  adaptive 
weights  may  only  need  to  be  computed  to  1 -percent  accu¬ 
racy  or  so  Such  issues  merit  attention  in  both  analog  and 
digital  optical  systolic  processors 

The  algorithms  selected  and  used  can  also  significantly 
affect  the  performance  obtained  in  both  analog  and  digital 
optical  systolic  processors  Parallel  algorithms  are  essential, 
and  not  all  algorithms  have  yet  seen  parallel  realizations 
The  specific  algorithm  used  must  often  be  selected  to 
match  the  specific  optical  systolic  architecture  In  all  in¬ 


stances,  robust  and  stable  algorithms  are  essential  and 
specific  attention  should  be  given  to  selecting  algorithms 
that  do  not  increase  the  condition  number  (and  hence  the 
accuracy  requirements)  of  the  original  problem. 

The  solutions  of  LAEs,  least-squares  problems,  and  eigen- 
systems  are  essential  problems  in  signal  processing  The 
major  direct  and  indirect  algorithms  to  solve  LAEs  were 
noted.  For  general-purpose  processors,  direct  algorithms  are 
often  preferable  since  the  number  of  iterations  and  process¬ 
ing  time  required  is  known  For  specific  applications,  indi¬ 
rect  algorithms  are  acceptable.  Direct  algorithms  appear  to 
require  more  precision  at  each  multiplication  step  than  do 
indirect  solutions,  however,  they  will  then  also  provide 
more  accurate  results  In  general,  it  is  necessary  to  employ 
improved  algorithms  and  attention  to  specific  applications 
to  fully  address  such  issues  Iterative  solutions  to  nonlinear 
equations,  eigensystems,  and  large  matrices  are  still  the 
preferable  and  often  the  only  approach 

In  this  paper,  many  optical  systolic  architectures  have 
been  reviewed  and  several  architectures  detailed  Attention 
was  given  to  architectures  for  matrices  with  specific  struc¬ 
ture  (banded,  Toeplitz,  and  triangular),  and  to  matrices  with 
general  structure  The  solution  of  LAEs,  least  squares  prob¬ 
lems,  and  eigensystems  were  selected  as  the  most  funda¬ 
mental  problems  It  is  quite  significant  that  one  optical 
systolic  architecture  can  achieve  all  of  the  basic  operations 
required  Efficient  digital  systolic  architectures  have  thus  far 
required  a  new  mesh  connection  for  different  functions 
The  use  of  various  indirect  and  direct  algorithms  and  associ¬ 
ated  optical  systolic  architectures  to  realize  each  were  de¬ 
scribed  and  discussed  Several  specific  applications  were 
detailed  to  demonstrate  the  many  diverse  linear  algebra 
problems  and  operations  that  emerge  These  included  the 
solution  of  partial  differential  equations,  adaptive  noise 
cancellation,  and  the  basic  operations  required  in  state 
estimation  and  Kalman  filtering 

The  field  of  optical  systolic  processors  is  quite  young  and 
active.  Many  architectures,  parallel  algorithms,  and  systems 
with  potentially  high  computational  rates  above  10°  multi¬ 
plications  per  second  have  been  suggested  In  several  in¬ 
stances,  prototype  systems  have  been  fabricated,  and  in 
other  instances  commercially  available  architectures  are 
being  fabricated.  Considerable  system  fabrication,  algo¬ 
rithm,  and  application-directed  research  remains  All  pre¬ 
sent  indicators  promise  a  bright  future  for  this  newest  topic 
in  optical  computing  Optical  systolic  array  processors 
achieve  the  flexibility  and  general-purpose  features  (that 
have  escaped  prior  systems),  the  accuracy  and  performance 
(that  have  eluded  prior  approaches  to  optical  computers), 
and  such  architectures  can  be  fabricated  with  available 
components  (at  competitive  cost,  size,  weight,  and  power 
dissipation  specifications) 
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ABSTRACT 

Direct  and  indirect  solutions  to  linear  algebraic  equations  (LAEs)  are  considered  with 
attention  to  the  use  of  optical  acousto-optic  (AO)  systolic  array  processors.  Specific  at¬ 
tention  is  given  to  error  sources  in  one“AO  systolic  processor.  A  case  study  of  an  LAE  solu¬ 
tion  is  conducted.  The  first  error  source  model  for  an  optical  systolic  array  processor  is 
advanced.  Using  this  and  digital  computer  modeling,  a  direct  solution  is  found  to  be  less 
sensitive  to  various  optical  system  error  sources  than  is  an  indirect  solution.  Acoustic 
attenuation  is  found  to  be  the  dominant  error  source  in  the  AO  systolic  array  processor  con¬ 
sidered.  Related  error  source  remarks  on  different  bipolar  data  representation  schemes  and 
on  optical  versus  digital  solutions  to  a  triangular  system  of  equations  are  also  advanced. 

1.  INTRODUCTION 

Optical  linear  algebraic  processors  are  currently  receiving  considerable  attention  (1  —  16]  . 
These  architectures  vary  from  simple  optical  systems  that  compute  matrix-vector  products  (1- 
2)  to  iterative  optical  processors  13-4]  that  solve  matrix-vector  equations  or  LAEs.  Newer 
architectures  using  AO  light  modulators  15-7]  are  more  attractive  and  can  be  fabricated  with 
presently  available  components.  These  architectures  15-7]  and  more  advanced  ones  using  2-D 
CCD-addressed  liquid  crystals  18]  represent  yet  another  class  of  optical  linear  algebra 
systems  known  as  optical  systolic  array  processors.  This  paper  focuses  on  only 
the  discussion  of  one  specific  architecture.  We  have  selected  the  frequency-multiplexed  AO 
architecture  1 7 ]  for  our  specific  case  study  in  this  paper.  Extensions  of  this freauency- 
multiplexed  AO  architecture  have  been  described  for  the  optical  solution  of:  nonlinear 
matrix  eouations  19-10],  LAE  solutions  by  matrix-decomposition  [11-13]  and  the  solution  of 
the  resultant  lower  or  upper  triangular  system  of  equations  [14]. 

In  this  paper,  we  consider  only  AO  systolic  processors  and  specifically  only  the  frequen¬ 
cy-multiplexed  optical  system  (this  decision  is  made  because  the  architecture  allows  more 
flexibility  in  the  data  format  possible  and  in  the  operations  achievable  on  the  system) .  In 
this  paper,  we  concentrate  on  various  possible  optical  and  digital  solutions  to  LAEs.  Atten¬ 
tion  is  specifically  given  to  the  error  sources  present  in  optical  systems.  This  subject 
has  not  received  attention  previously.  Other  techniques  to  achieve  increased  accuracy  by 
encoding  of  the  data  to  be  processed  using  various  methods  are  not  addressed  (such  architec¬ 
tures  generally  result  in  a  significant  reduction  in  the  number  of  operations  possible  per 
second  and  in  an  increased  complexity  in  the  output  detector  array).  Similarly,  vector-outer 
product  optical  processors  are  not  addressed  (since  they  require  the  readout  of  an  entire 
2-D  output  matrix  of  data  every  bit  time  TB). 

In  Section  2,  we  briefly  review  the  AO  frequency-multiplexed  architecture  and  several  of 
the  different  operations  that  it  can  achieve.  Attention  is  given  to  iterative  (or  indirect) 
and  direct  (specifically  matrix-decomposition)  solutions  to  LAEs.  When  direct  techniques  are 
used,  the  final  step  reauired  is  the  solution  of  a  triangular  system  of  equations.  In  Sec¬ 
tion  2,  we  note  that  this  is  also  possible  both  optically  and  digitally.  In  Section  3,  we 
advance  the  first  error  source  and  component  model  for  an  optical  systolic  array  processor 
using  AO  devices.  In  Section  4,  we  discuss  how  this  model  is  incorporated  into  a  digital 
simulator  to  model  and  analyze  the  effects  of  the  different  error  sources  present  in  such 
advanced  data  processors.  We  also  advance  initial  remarks  on  the  effects  of  different  data 
encoding  schemes  for  representation  of  bipolar  data  (with  attention  to  the  effect  that  opti¬ 
cal  system  and  component  error  sources  and  noise  have  on  the  resultant  performance  and  accu¬ 
racy).  In  Section  5,  we  present  initial  results  obtained  for  an  optical  direct  and  indirect-' 
solution  of  a  system  of  LAEs.  We  also  consider  a  hybrid  optical  and  digital  direct  solution 
toanLAE  problem.  We  quantify  the  dominant  system  and  component  error  sources  found  and  the 
performance  and  accuracy  achievable.  Conclusions,  guidelines  and  a  summary  are  then  advanced 
in  Section  6. 

2.  FREQUENCY-MULTIPLEXED  AO  SYSTOLIC  PROCESSOR 

The  basic  frequency-multiplexed  systolic  AO  ar-ay  processor  (SAOP)  [7]  to  be  considered 
is  shown  schematically  in  Figure  1.  It  consists  of  a  linear  array  of  point  modulators 


imaged  through  separate  spatial  regions  of  an  AO  cell  with  the  Fourier  transform  of  the  re¬ 
sultant  data  collected  on  an  output  linear  detector  array.  The  point  modulator  inputs  can 
be  time  and  space  multiplexed  and  the  AO  cell  inputs  can  be  time  and  f reouency-.  ul tiplexed . 
This  enables  this  system  to  perform  m.itrix-matr  ix  multiplications  with  one  matrix-vector 
product  (one  column  or  row  of  a  matrix-matrix  product)  produced  in  parallel  every  bit  time 
Tg.  The  bit  time  Tb  is  the  time  required  for  the  AO  cell  data  to  propagate  between  two  spa¬ 
tially  adjacent  regions  of  the  AO  cell.  This  time  Tb  also  represents  tie  rate  at  which  new 
data  can  be  fed  in  parallel  to  the  AO  cell  and  to  the  linear  point  modulator  input  array. 


FIGURE  1  Schematic  diagram  of  a  f reouency-multiplexed 
AO  systolic  array  processor. 
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To  describe  the  operation  of  this  system  most  simply,  wi  consider  its  use  in  the  calcula¬ 
tion  of  the  3x3  matrix-matrix  product 
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For  now,  we  consider  the  case  when  time  and  space  multiplexing  of  the  rows  and  columns  of  A 
and  time  and  frequency-multiplexing  of  the  rows  or  columns  of  B  is  performed,  as  noted  in 
(1).  After  3Tb ,  the  entire  B  matrix  is  present  in  the  lower  3Tb  of  the  AO  cell.  Point  modu¬ 
lator  inputs  3-5  are  now  pulsed  on  with  the  first  row  of  A.  The  first  row  of  the  matrix- 
matrix  product  AB  *  C  is  then  produced  immediately  on  thi  output  detector  array  in  parallel. 
At  the  next  Tb,  the  data  input  to  the  AO  cell  is  shifted  up  by  Tb-  We  now  pulse  on  point 
modulators  2-4,  with  the  data  input  being  the  next  row  of  the  matrix  A  and  immediately  ob¬ 
taining  the  second  row  of  C  at  the  output  of  this  system.  This  procedure  is  repeated  until 
all  rows  of  the  matrix-matrix  product  have  been  produced. 


We  now  briefly  describe  several  other  data  formats  and  applications  of  this  basic  matrix- 
matrix  or  matrix-vector  processor  architecture.  In  general,  with  2N  - 1  LEDs  and  with  an  AO 
cell  with  a  time-aperture  T*“(2N-1)Tb  we  form  N  vector  inner  products  on  N  element  vectors 
every  Tb  (all  in  parallel).  As  we  have  previously  shown  (7,12-16),  pipelining  and  the  flow 
of  data  and  operations  is  quite  ideal  in  this  system  architecture. 


2.1  INDIRECT  (ITERATIVE)  SOLUTIONS  OF  LAEs 


For  an  indirect  or  interative  solution  to  the  LAE  equation 


Ab  «  c  (2) 

for 

b  -  A-1£,  (3) 

we  prefer  the  iterative  Richardson  algorithm  (4,6,7,15).  In  this  case,  we  use  the  basic 
optical  matrix-vector  multiplication  system  in  Figure  1  in  conjunction  with  a  parallel  analog 
adder  and  feedback  of  the  output  directly  into  the  AO  cell.  This  configuration,  described  in 
(7) ,  realizes  the  iterative  algorithm 


b(jtl)  “  b(j)  -  wAb(j)  *  me,  (4) 

where  j  denotes  the  iterative  index  or  time -step,  and  where  w  is  the  acceleration  parameter, 


/-  'I  — 

^  J. 
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which  is  selected  as  described  in  [41.  When  b(j+l)  *  b(j),  equation  (4)converges  to  (2)  and 
the  output  b  is  the  desired  solution  in  (3).  ~To  achieve  this  [7),  we  frequency-multiplex  the 
rows  of  A  and  time-sequentially  multiplex  the  co’  mns  of  B.  As' we  will  see  later,  recycling 
of  A  witKin  an  AO  cell  of  length  TA  =  NTB  is  preferable  to  the  use  of  a  longer  AO  cell  length 


Ta  -  (2N-1 ) Tb . 


2.2  BANDED  MATRIX  AND  TRIANGULAR  SYSTEM  SOLUTION 


If  we  feed  the  vector  b  output  back  to  the  AO  cell  and  if  one  row  of  A  is  fed  to  the  LEDs 
in  parallel  at  one  time  [T4],  then  the  same  architecture  in  Figure  1  is  Tdeal  for  the  solu¬ 
tion  of  banded  matrix  problems  and  a  triangular  LAE  solution. 


2.3  DIRECT  SOLUTIONS  (MATRIX-DECOMPOSITION) 


If  we  frequency-multiplex  the  columns  (rather  than  the  rows)  of  the  matrix  A  and  if  we 
feed  one  column  (rather  than  one  row)  of  the  matrix  B  to  the  LEDs  in  parallel , “then  we  form 
the  matrix-matrix  product  B  A  at  the  output  (rather  than  the  matrix-matrix  product  AB)  [7], 
This  data  encoding  approacK  "is  ideal  [12,13)  for  matrix-decomposition  algorithms  (tKe  basic 
step  in  a  direct  solution  of  an  LAE).  In  [12]  and  [13],  we  detail  this  matrix-decomposition 
procedure  for  the  cases  of  LU  (Gauss  elimination),  QR,  and  Cholesky  decompositions. 

2 . 4  FATA  FLOW 

Moreover,  we  showed  earlier  that  the  pipelining  and  data  flow  in  such  an  approach 
is  attractive  (the  same  remarks  apply  to  the  indirect  algorithms  described  in  Section 
2.1).  Specifically,  every  bit  time  TB,  one  time-slot  of  data  leaves  the  AO  cell  and  a  new 
time-slot  of  data  must  be  entered  into  the  AO  cell.  With  the  aperture  time  TA  of  the  AO  cell 
properly  chosen  for  a  given  problem,  we  find  that  the  parallel  output  detected  data  can  be 
operated  upon  and  fed  back  immediately  to  the  AO  cell  input.  Thus,  in  the  realization  of  all 
of  the  algorithms  we  describe,  the  output  data  are  immediately  fed  back  into  the  system  as 
they  are  produced. 

2.5  MATRIX  INVERSION 


The  data  encoding  in  Section  2.1  is  also  appropriate  to  allow  matrix  inversion  on  this 
system.  This  aspect  of  this  processor  was  fully  detailed  in  [7],  It  is  thus  not  discussed 
in  further  detail  here. 


3.  ERROR  SOURCE  MODEL 

Our  ;omponent  error  source  model  is  summarized  in  Table  1.  We  consider  calculation  of 
the  matrix-vector  product  A  b  •  c.  We  separate  all  component  errors  into:  input  plane,  AO 
cell  plane  and  detector  plane  errors,  and  we  denote  each  by  a  separate  superscript  as  noted 
in  Table  1.  We  denote  the  spatial  coordinates  of  the  input  plane  and  the  AO  cell  by  the  sub¬ 
script  i  and  the  frequency  coordinate  of  the  AO  cell  and  the  output  detectors  by  the  sub¬ 
script  j. 


ERROR  SOURCE 

NOTATION 

AO  CELL  PLANE 

ERRORS 

Amplifier  Errors 
Spatial  Response 

AO  Transfer  Function 

Acoustic  Attenuation 

1  4  6  (2) 

1  4  6<2> 

H  (f  j ) 
exp (-ax) 

DETECTOR  PLANE 

ERRORS 

Spatial  Response 

Dark  Current 

Time-Varying  Noise 

1  +  fij31 

d3 

nj  (t) 

ERROP  SOURCE 

NOTATION 

Spatial  Errors 

Frequency  Errors 

Input  Plane  Errors 

AO  Cell  Errors 

Detector  Plane  Errors 

Subscript  i 
Subscript  j 
Superscript  1 
Superscript  2 

Superscript  3 

INPUT  PLANE  E 

PRORS 

Point  Modulator 

Spatial  Gain 

Nonuniform 

Response 

1  4  ««> 

1  4  «*1> 

Coupling  (Spatial) 

1  +  «i3) 

TABLE  1.  SAOP  ERROR  SOURCE  MODEL 


For  the  input  plane,  we  note  that  the  light  intensity  incident  on  the  AO  cell  with  all 
errors  included  can  be  described  by  the  factor 


6i  '  bi( 


1  *  A(1>  *  A(1)  *  A 

1  *  6il  *  6i2  +  6 


U)\ 

i  3  /' 


(5) 


where  b  is  the  point-modulator  vector  input  data.  Similarly,  the  space  i  and  frequency  j 
transmittance  of  the  AO  cell  for  the  matrix  element  ajj  is  described  b, 


>i  ‘  “ji( 


J  ♦  (  *2  )  +  j 


^2,)mf  )exp(-axi)  . 


(6) 


Likewise,  the  actual  detector  plane  output  6_  (the  observed  output)  is  related  to  the  exact 
Sj  value  and  the  other  error  source  parameters  by 

-  sj  (1  4  i(V  )  4  d  4  n  (t)  .  (7) 


Combining  all  of  these  factors  in  (5)  -  (7),  we  note  that  in  general  all  of  the  error 
sources  must  be  small  (this  is  realistic  and  necessary  to  obtain  reasonable  accuracy  in  such 
a  processor).  In  this  case,  we  can  describe  the  observed  detector  output  ft,  in  terms  of  the 
exact  inputs  a21  and  b*  as 


ft,  -  fa.  .b.  ( 1  ♦  ip  .  )  (l4fi  )H(f  .)e  1  4  d  .  4  n.(t), 

J  l  j 1  1  1  J  1  J  1 


(8) 


where 


*i  -  4  4  4  6(2>  4  6<2). 


(9) 


For  a  (2x  2)matrix,  the  observed  outputs  (ftj , ft2 r  are  related  to  the  various  component  error 
sources  in  Table  1  by 


l*d 


(3) 

1 

0 


1 4  £ 


C3H 


H(f1) 


H  ( f  2 ) 


all  *12] 


’21  "22 


14^_ 


14,  . 


,-0x2 


DET  SPAT 
ERRORS 


AO  FREQ  RESP  DATA 
MATRIX 


SPAT 

ERRORS 

INPUT 


bl 

_b2_ 

+ 

R 

_d2_ 

4 

nj  (t) 

n2  (t) 

DATA  DET 

DET 

b 

DARK 

NOISE 

I 

.(10) 


If  we  assume  that  the  acoustic  attenuation  a  is  small,  then  (10)  reduces  to 


111) 


EXACT  SPATIAL  TEMPORAL 


This  latter  formulation  in  (11)  is  attractive  because  it  shows  that  spatial  and  temporal 

errors  can  be  separated  in  such  an  AO  systolic  processor.  This  is  useful,  since  all  spatial 
errors  can  then  be  reduced  to  any  desired  level  by  applying  the  associated  fixed  correction 
factors  to  the  input  point  lioht-modulator  input  and  to  the  output  detector  elements.  In 
closing  our  remarks  on  error  sources,  we  noted  that  the  new  spatially-multiplexed  bipolar- 
data  representation  scheme  we  advanced  earlier  17)  is  very  attractive  since  it  does  not  re¬ 
sult  in  the  magnification  of  system  errors.  The  various  biasing  and  scaling  techniaues  pre,- 
viously  proposed  to  accommodate  bipolar  data  in  such  an  optical  matrix-vector  processor  re¬ 
sult  in  a  magnification  of  any  residual  system  errors  (by  a  factor  equal  to  the  dynamic  range 
of  the  matrix).  In  general,  such  errors  rapidly  become  ouite  intolerable. 

4 .  SIMULATION  OF  SAPP  ERROR  SOURCES 


To  determine  the  dominant  error  sources,  to  quantify  the  degree  to  which  the  various  error 
sources  must  be  reduced,  and  to  quantify  the  performance  to  be  expected  as  a  function  of  all 


4 


/ 

j 


f'b 


of  the  various  system  parameters ,  digital  simulation  techniques  are  essential  and  were  em¬ 
ployed.  In  this  section,  we  discuss  several  of  the  details  associated  with  our  digital 
modeling  and  simulation  of  the  SAOP  system  error  sources  and  model  noted  in  Section  3.  From 
(10),  we  note  that  the  SAOP  system  and  component  errors  are  multiplicative  and  are  a  matrix 
cascade.  This  is  distinguished  from  the  error  source  analysis  and  modeling  we  conducted  for 
the  fixed-mask  iterative  optical  processor  (10P)  system.  In  the  case  of  the  IOP  system.,  we 
found  the  error  sources  tf  this  architecture  to  be  additive  |10],  rather  than  multiplicative. 
We  also  note  that. for  the  matrix-matrix  multiplication  required  in  the  LI)  decomposition  (one 
approach  to  the  direct  solution  of  LAEs) ,  the  matrix  cascade  of  errors  is  reversed  since  row¬ 
wise  multiplication  (rather  than  column-wise  multiplication)  is  employed  (6ee  Section  2). 
However,  the  same  basic  results  are  expected  for  both  systems.  We  also  note  that  we  assure 
(in  our  analysis)  that  the  residual  spatial  errors  (for  the  input,  AO  cell  and  detector  sys¬ 
tem)  are  reduced  to  a  significantly  low  level,  but  are  present  even  after  correction.  Our 
intent  is  to  quantify  the  amount  to  which  correctable  spatially-fixed  errors  must  be  reduced 
and  the  amount  of  time-varying  detector  noise  and  acoustic  attenuation  that  is  allowable  in 
such  a  processor. 

In  our  digital  modeling,  we  represent  residual  spatial  errors  (input,  AO,  and  detector 
plane)  by  Gaussian  random  variables  with  3o  maximum  deviations  equal  to  the  fractional  residual  error 
remaining  after  corrections.  These  residual  errors  are  included  as  fixed-multiplicative 
factors  that  we  apply  to  the  point-source  inputs  and  the  detector  outputs  at  each  matrix- 
vector  multiplication.  Detector  plane  temporal  errors  (noise  versus  dark  current  spatial 
variations)  are  also  modeled  by  similar  Gaussian  random  variables  applied  to  each  vector  out¬ 
put  produced  on  the  detectors.  However,  a  different  seed-value  is  used  to  produce  uncorrela¬ 
ted  noise  that  is  added  to  each  matrix-vector  output  product  at  each  Tg  time  to  appropriately 
model  detector  system  time-varying  noise.  This  approach  models  the  time-varying  detector 
noise  and  distinguishes  the  fixed  spatial  errors  from  the  time-dependent  noise  errors. 
Acoustic  attenuation  effects  are  handled  by  directly  including  the  necessary  exponential  at¬ 
tenuation  factor  into  the  input  data  to  the  AO  cell  (and  the  associated  transmittance  of  the 
AO  cell).  Acoustic  attenuation  is  dispersive.  However,  our  initial  tests  included  only  a 
fixed  attenuation  n  which  can  thus  be  transferred  to  the  point-modulator  input  plane  (and 
subsequently  corrected  to  the  degree  necessary) . 


As  performance  measures,  we  use  three  quantities, 
in  the  calculated  vector  b  is  used;  i.e.. 


First,  the  Euclidean  norm  of  the  error 


lib! 


1  avg 


I  t  <c  -b*)2)1/2. 

l  a  i 


(12) 


This  error  measure  corresponds  to  the  average  error  in  the  calculated  vector  output  £.  If  we 
divide  | | ib  |  |  by  the  norm  of  the  exact  b*  vector,  then 


ibl 


l 


vg 


{100  x ( | |ibj |/| |b* | I)} 


(13) 


defines  the  average  percent  error  in  the  elements  of  the  calculated  vector.  In  (12)  and  (13), 
| ! (  ) | |  denotes  the  Euclidean  norm  of  the  corresponding  vector,  £  denotes  the  measured  value 
of  the  vector,  and  b*  denotes  the  exact  value  of  the  associated  vector. 

The  second  performance  measure  we  use  is  the  maximum  error  in  any  single  element  of  a 
calculated  vector.  This  corresponds  to  a  very  worst-case  error.  This  performance  measure 
is  described  analytically  by 


ltb'max% 


max 


J  {100  (b^  -bi)/b*). 


(14  ' 


This  error  measure  is  an  extremely  worst-case  one.  The  final  error-measure  we  considered 
was  the  maximum  error  in  the  closed-loop  poles  of  the  resultant  system.  This  error  measure 
is  simply  defined  by 


AX  » 
1  'max 


,"“X{100(Xi  »X*)/A*}, 


<15  > 


where  A^  denotes  the  calculated  poles  and  X^  denotes  the  location  of  the  exact  pole  values. 
This  particular  error-source  measure  is  most  appropriate  for  optimal  control  application.- 
It  is  also  most  appropriate  to  provide  a  specific  case  study  and  application  of  a  cituatior 
in  which  a  large  error  in  one  element  of  the  vector  output  does  not  appreciably  affect  the 
net  performance  of  the  system. 


In  general,  different  performance  measures  are  appropriate  for  different  problems  and 
applications .  Attention  to  the  worst-case  element  error  in  (14)  is  not  an  appropriate  meas¬ 
ure  of  an  optical  LAE  solution  for  many  specific  cases  and  applications.  When  the  applica- 
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tion  can  be  specifically  defined,  other  performance  measures  are  more  appropriate  than  this 
worst-case  one.  For  certain  specific  case-studies  and  applications,  we  note  the  fact  that 
the  following  performance  measures  are  appropriate  for  t!.;  various  indicated  applications: 
locations  of  the  closed-loop  poles  of  this  system  (this  is  appropriate  for  control  applica¬ 
tions),  SNR  (this  is  appropriate  for  adaptive  filtering  applications) ,  and  symbol  error  rate 
(this  is  appropriate  for  communication  applications).  For  our  present  studies,  we  use 
the  three  performance  measures  noted  above:  (12)  (average  error),  (14)  (maximum  percentage 
error  in  any  element  of  the  computed  vector) ,  and  (15)  (the  maximum  percent  error  in  the  lo¬ 
cation  of  the  closed-loop  poles  of  this  system). 

S.  INITIAL  EXPERIMENTAL  RESULTS 

The  purposes  of  our  initial  simulations  were:  (1)  quantification  of  the  amount  of  allow¬ 
able  residual  spatial  errors  in  the  input  and  detector  planes  (these  are  entered  as  percent¬ 
ages  in  our  table);  (2)  quantification  of  the  amount  of  allowable  time-varying  detector  noise 
(this  time-varying  detector  noise  parameter  is  also  entered  as  a  percentage  in  our  tables  of 
data  presented);  and  (3)  quantification  of  the  amount  of  AO  cell  acoustic  attenuation  possi¬ 
ble  (this  is  entered  in  units  of  dB/cm  in  our  tables  of  data.  Only  the  results  for  O.ldB/em 
or  approximately  64  spatial  errors  across  the  AO  cell  are  shown  and  considered  in  the  data 
presented.  This  was  necessitated  by  the  fact  that  the  use  of  larger  a  values  yielded  un¬ 
acceptable  performance  and  no  convergence  for  the  algorithm  in  many  instances).  Data  for 
the  three  performance  measures  in  (12),  (14)  and  (15)  are  given  in  the  tables. 


The  specific  LAE  solution  case-study  used  arose  from  the  final  outer-loop  solution  of 
an  LQR  (linear  quadratic  regulator)  design  with  the  algebraic  Ricatti  equation  for  an  F100  air¬ 
craft  with  three  states  and  three  controls  (9,10).  This  situation  corresponds  to  an  LAE  of 
order  N  *  9.  The  matrix  associated  with  this  matrix-vector  solution  has  no  specific  struc¬ 
ture  and  is  essentially  full.  It  is  characterized  by  a  condition  number  C  «  2.4B  and  a 
dynamic  range  *  47.7.  The  acceleration  parameter  u>  *  l/A-,ax  *  -0.044  was  selected  as  we 
described  earlier  (4,6).  For  this  specific  problem  j |b* | ]  -  0.46  is  the  average  outDut  plane 
value.  This  parameter  can  be  used  to  express  the  average  error  | | Lb | |  as  a  percentage  of  the 
indicated  system  performance. 

Our  tests  were  intended  to  quantify  the  component  performance  of  an  indirect  (or  iterative) 
solution  using  the  Richardson  algorithm  and  a  direct  solution  (using  LU  decomposition).  For 
the  iterative  solution,  we  used  J  ■  10  iterations.  This  value  was  determined  from  four  times 
the  condition  number  of  the  matrix  as  described  in  (161. 


TABLE  2.  ERROR  SOURCE  EFFECTS  IN  AN  INDIRECT  LAE  SOLUTION 


TEST 

NUMBER 

RESIDUAL 
SPATIAL 
ERRORS  (t) 

INPUT  DET 

TIME-VARYING 
DETECTOR 
NOISE  (») 

AO  CELL 

ATTENUATION 

dB/cm 

tRROR  MEASURES 

l£inax(%)  l^!max(*> 

1 

0  0 

0 

0 

O.28xl0"3  0.39  0.26xl0"3 

2 

0  0 

1 

0 

D.71xl0"3  2.8  0.52xl0"2 

3 

1  1 

0 

0 

3.37x10"*  1.7  0.27 

4 

1  1 

1 

0 

0.39x10"*  1.36  0.26 

5 

1  1 

1 

0.1 

0.16  61.8  15  8 

6 

0  0 

0 

0.1 

0.16  60.4  16.1 

In  Table  2,  we  show  the  results  of  an  indirect  solution  for  six  different  sets  of  system 
and  component  errors.  The  error  and  noise-free  results  in  Test  1  were  obtained  with  36-bit 
digital  accuracy.  As  seen,  excellent  accuracy  was  obtained  in  these  experiments  (the  per¬ 
formance  obtained  is  limited  by  the  finite  word-length  and  number  of  Iterations  performed) . 
Test  2  shows  the  effects  of  1%  detector  noise  error-source  alone.  The  accuracy  obtained  is 
better  than  0.1%  even  though  one  element  of  the  matrix-vector  output  was  in  error  by  2.84. 
The  effects  of  It  spatial  input  errors  and  1%  spatial  output  errors  (Test  3)  alone  show  that 
better  than  1%  accuracy  is  still  possible  on  such  a  system.  However,  the  maximum  error  in 
one  element  of  the  computed  matrix  is  1.71.  In  Test  4,  both  spatial  errors  and  detector 


noise  were  present.  The  results  of  these  tests  confirmed  the  implications  advanced  by  our 
earlier  findings  on  similar  tests  performed  on  our  1 OP  optical  matrix  processor.  Specifical¬ 
ly.  we  found  that  the  presence  of  botn  spatial  and  temporal  errors  did  r.  t  appreciably  affect 
the  accuracy  obtained  in  this  system  (compared  to  the  case  when  only  one  type  of  error  was 
present).  In  Test  6.  we  used  the  value  a  •  O.ldB/cm  for  the  acoustic  attenuation  present. 
From  the  results  obtained  and  from  the  results  in  Test  S  (when  acoustic  attenuation  and  all 
other  error  sources  are  present) ,  we  see  that  acoustic  attenuation  is  the  dominant  error 
source  effect  and  tha'  the  a  value  used  must  be  significantly  reduced  if  wr  are  to  obtain 
adequate  performance  from  such  a  matrix-vector  processor. 


TABLE  3.  ERROR  SOURCE  AND  NOISE  EFFECTS  ON  DIRECT  AND  INDIRECT 
AND  OPTICAL  AND  DIGITAL  TRIANGULAR  SYSTEM  SOLUTIONS 
TO  LAES. 


Test  no. 

/  back\ 

^SUBST ) 

RESIDUAL 
SPATIAL 
ERRORS  (•) 

Time-varying 

DETECTOR 
NOISE  (t) 

A6  CELL 
ATTENUATION 

dB/cm 

t 

RFOR  MEASURES 

IfAbll 

^Lx‘*> 

'£>Lex(t> 

INPUT 

DET 

1 

INDIR 

0 

0 

0 

0 

0.28xl0"3 

0.39 

0.26x10° 

2 

DIRECT 

0 

0 

0 

0 

O.llxlO"5 

O.lBxlO”2 

0.10x10° 

?  (VAX) 

0 

0 

1 

0 

O.Bxlff3 

1.27 

O.BBxlO-2 

4 (SAOP) 

0 

0 

1 

0 

l.Oxlff3 

1.47 

0.22xl0_1 

5 (SAOP) 

■ 

1 

0 

0 

l.hxlff2 

6.0 

0.34 

6 (SAOP) 

■ 

■ 

0 

l.Oxlff2 

8.8 

0.39 

7 (SAOP) 

ill 

■ 

1 

0.1 

l.lxlff2 

9.2 

0.37 

8  (VAX) 

1 

■ 

1 

0.1 

Hi 

8.7 

0.57 

In  Table  3.  we  compare  direct  and  indirect  solutions  to  LAEs.  The  error  and  noise-free 
results  in  Tests  1  and  2  show  that  better  accuracy  appears  to  be  obtainable  with  a  direct 
solution.  However,  this  is  misleading  since,  if  the  number  of  iterations  J  were  increased 
to  50,  then  both  algorithms  would  yield  similar  error  and  noise-free  performance.  In  Tests 
3  through  8,  we  included  various  amounts  of  spatial  errors,  temporal  noise,  and  acoustic 
attenuation.  As  seen,  a  direct  algorithm  yields  better  accuracy  and  performance  than  an  in¬ 
direct  algorithm.  Specifically,  0.5-2%  accuracy  is  obtained  (the  maximum  error  in  one  ele¬ 
ment  is  8-9%)  even  with  1%  input  spatial  error,  1%  output  spatial  error,  1%  detector  noise 
and  O.ldB/cm  acoustic  attenuation  all  present.  We  also  note  from  this  data  that  in  a  direct 
solution,  acoustic  attenuation  is  no  longer  necessarily  the  dominant  error  source.  Further 
tests  on  various  implementation?  of  the  direct  LAE  solution  were  also  conducted  and  are  in¬ 
cluded  in  Table  3.  These  involve  performing  the  matrix  decomposition  optically  followed  by 
the  solution  of  the  resultant  triangular  system  of  equations  digitally  to  36-bit  accuracy 
(this  is  denoted  by  VAX  in  parentheses  in  Table  3)  or  optically  using  our  triangular  system 
solutions  algorithm  (14]  (this  is  denoted  by  SAOP  in  parentheses  in  Table  3).  Comparing  the 
results  of  the  VAX  tests  (3  and  8)  and  the  SAOP  tests  (4-7),  we  find  negligible  difference 
in  performance  whether  the  triangular  system  was  solved  optically  or  digitally.  This  is 
expected  due  to  the  nature  of  the  simpler  vector  inner  product  calculations  required  in  a 
triangular  system  solution. 

6.  SUMMARY.  CONCLUSION  AND  GUIDELINES 

In  this  paper,  attention  was  given  to  one  optical  systolic  array  architecture,  the  fre¬ 
quency-multiplexed  SAOP.  The  flexibility  possible  in  formatting  data  in  this  architecture 
was  noted  together  with  examples  of  how  the  same  architecture  can  be  used  for  many  different 


operations.  This  flexibility  plus  the  reduced  component  requirements  (fewer  point  light 
modulators  and  detectors  and  lower  per-channel  data  rates  are  required  to  achieve  perform¬ 
ance  comparable  to  other  architectures)  appea  *  to  make  this  system  more  attractive  than 
others.  Hence,  we  restricted  attention  to  it.  Extensions  of  this  basic  architecture  using 
multi-channel  AO  cells  are  also  obvious  and  direct.  In  this  paper,  we  also  advanced  the  first 
component  error-source  model  for  an  optical  systolic  processor  and  we  noted  that  many  system 
and  component  errors  are  spatially-fixed  and  hence  are  correctable.  To  quantify  the  level 
to  which  various  errors  and  noise  must  be  reduce-*  and  to  quantify  the  performance  expected, 
we  detailed  the  digital  model  for  the  SAOP  architecture  and  its  system  and  component  error 
and  noise  sources.  By  simulation,  we  tested  an  indirect  algorithm  solution  and  found  that 
acoustic  attenuation  was  the  dominant  error  source,  that  1%  accuracy  could  be  obtained;  but 
to  do  this. the  acoustic  attenuation  must  be  below  O.ldB/cm.  We  compared  direct  and  indirect 
solutions  and  found  that  direct  solutions  yielded  significantly  better  accuracy  and  perform¬ 
ance.  Finally,  we  compared  direct  solutions  in  which  the  triangular  system  of  equations 
that  resulted  was  solved  optically  and  digitally.  Negligible  difference  was  found  if  either 
approach  was  used.  Using  a  direct  solution,  accuracy  and  performance  approaching  1%  appears 
to  be  possible  using  realistically  achievable  component  quality  and  detector  noise. 

Further  tests,  experimental  verification,  and  more  general  analyses  and  trends  and  quan¬ 
tification  for  other  specific  applications  is  necessary  before  definitive  general  conclusions 
should  be  advanced.  However,  qualitative  explanations  for  all  of  the  results  obtained  have 
been  advanced  and  thus  the  trends  observed  appear  to  be  representative  of  a  general  matrix- 
vector  LAE  problem  solution. 
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Optical  Kalman  filtering  for  missile  guidance 


David  Casasent,  Charles  P.  Neuman,  and  John  Lycas 


Optical  systolic  array  processors  constitute  a  powerful  and  general-purpose  set  of  optical  architectures  with 
high  computational  rates.  In  this  paper.  Kalman  filtering,  a  novel  application  for  these  architectures,  is  in¬ 
vestigated.  All  required  operations  are  detailed:  their  realization  by  optical  and  special-purpose  analog  elec¬ 
tronics  are  specified;  and  the  processing  time  of  the  system  is  quantified.  The  specific  Kalman  filter  appli 
cation  chosen  is  for  an  air-to-air  missile  guidance  controller.  The  architecture  realized  in  this  paper  meets 
the  design  goal  of  a  fully  adaptive  Kalman  filter  which  processes  a  measurement  every  I  msec.  The  vital 
issue  of  flow  and  pipelining  of  data  and  operations  in  a  systolic  array  processor  is  addressed.  The  approach 
is  sufficiently  general  and  can  be  realized  on  an  optical  or  digital  systolic  array  processor. 


I.  Introduction 

A  multitude  of  optical  systolic  array  processors1  -s 
have  recently  been  proposed.  These  processors  com¬ 
prise  a  broad  class  of  optical  linear  algebra  processors. 
Numerous  engineering  applications  of  these  processors 
have  been  described,  including  adaptive  phased  array 
radar,6  optimal  control,7  8  and  Kalman  filtering.3  9  In 
this  paper,  we  detail  the  realization  of  a  discrete-time 
extented  Kalman  filter  (EKF)  for  air-to-air  missile 
guidance  using  optical  systolic  array  processors.  This 
application  provides  a  specific  case  study  of  the  use  of 
an  optical  systolic  linear  algebra  processor  in  a  full 
problem  application.  This  case  study  leads  to  a  novel 
discrete-time  EKF  algorithm  with  sufficient  parallelism 
for  realization  on  an  optical  or  digital  systolic  array 
processor.  Our  approach  results  in  a  novel  algorithm 
and  novel  operations  that  are  possible  on  optical  systolic 
processors.  We  realize  an  EKF  because  the  missile  and 
target  are  modeled  by  linear  differential  equations  in 
Cartesian  coordinates,  whereas  the  measurement  model 
is  nonlinear.  Linearizing  the  nonlinear  measurement 
equation  about  the  most  recent  relative  motion  esti¬ 
mates  results  in  an  EKF.  Discretization  of  the  con¬ 
tinuous-time  Kalman  filter  leads  to  a  novel  discrete¬ 
time  algorithm.  Such  a  discrete-time  algorithm  is  es¬ 
sential  for  realization  on  optical  or  digital  systolic  pro¬ 
cessors. 
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We  note  recent  publications  on  systolic  Kalman  fil¬ 
ters,  which  appeared  as  our  work  was  reaching  com¬ 
pletion.  A  steady-state  analysis  of  finite  word-length 
effects,  roundoff-error  propagation,  stability  and  esti¬ 
mation  sensitivity  is  detailed  in  Ref.  10  for  a  systolic 
Kalman  filter  architecture.  Our  fully  adaptive  optical 
systolic  Kalman  filter  (which  processes  a  measurement 
every  1  msec)  incorporates  the  automatic  updating  of 
the  Kalman  filter  gain  and  covariance  matrix  of  the 
error  of  state  estimation  and  thus  differs  appreciably 
from  this  work.  Extended  Kalman  filter  algorithms  for 
optical  implementation  are  proposed  in  Ref.  11,  but 
implementation  details  are  not  provided.  In  this  paper, 
we  detail  the  design  and  realization  of  a  complete  dis¬ 
crete-time  EKF  optical  systolic  array  processor. 

A.  Motivation 

Proportional  navigation  guidance  (PNG)1-  is  the 
traditional  guidance  law  used  for  air-to-air  missiles.  In 
this  controller,  noisy  measurements  of  the  target's  po¬ 
sition  and  velocity  are  fed  to  the  PNG  computer,  which 
estimates  the  line-of-sight  rate  and  calculates  the  mis¬ 
sile  acceleration  for  the  steering  autopilot,  which  is  then 
applied  to  the  missile’s  actuators  (the  fins)  to  control 
the  missile’s  position  and  velocity.  These  estimates  are 
fed  back  to  the  PNG  computer,  new  target  measure¬ 
ments  are  taken,  and  the  process  is  repeated.  Of  the 
PNG  assumptions,  removing  the  assumption  of  a  con¬ 
stant  relative  missile-to-target  velocity  provides  the 
largest  improvement,13  especially  for  the  case  of  evasive 
targets.  For  advanced  guidance  laws  to  be  practical, 
enhanced  target  motion  estimates  are  required. 
Modern  filtering  algorithms,  such  as  the  Kalman  filter, 
can  provide  such  estimates.  The  Kalman  filter  pro¬ 
vides  the  optimum  estimate  (in  a  least-mean  square  or 
maximum  likelihood  sense).  Such  algorithms  use  the 
kinematics  and  dynamics  of  the  missile  and  the  target. 
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plus  the  statistics  of  the  noise  in  the  measurements  and 
in  the  dynamic  process  disturbances. 

B.  Overview 

In  Sec.  II.  we  highlight  the  missile-target  and  mea¬ 
surement  models,  and  review  the  conventional  contin¬ 
uous-time  Kalman  filter  and  EKF  formulations.  A 
novel  discrete-time  EKF  is  then  introduced  (in  Sec.  Ill) 
and  the  linear  algebra  operations  required  in  each  step 
are  defined.  The  major  operation  required  is  the  so¬ 
lution  of  a  quadratic  matrix  equation.  In  Sec.  IV,  we 
review  optical  systolic  processors  with  attention  to  one 
specific  architecture3  and  to  the  variety  of  achievable 
operations  by  format  control.  A  new  optical  system 
solution  to  a  quadratic  matrix  equation  is  then  ad¬ 
vanced  in  Sec.  V  and  the  required  operations  are  noted. 
These  include  a  new  optical  systolic  system  for  calcu¬ 
lation  of  the  Jacobian  matrix.  In  Sec.  VI,  the  realization 
of  all  operations  required  in  our  EKF  is  summarized, 
and  the  load  time  and  calculation  time  for  each  step  in 
our  algorithm  are  detailed.  Our  full  system  architecture 
is  advanced,  the  critical  time  path  is  isolated,  and  the 
processing  time  required  for  our  EKF  is  quantified. 
Our  summary  and  conclusions  are  then  advanced  in  Sec. 
VII. 

II.  Continuous-Time  EKF  for  Dynamic  Systems 

In  this  section,  we  highlight  the  missile-target  and 
measurement  models  and  review  the  continuous-time 
Kalman  filter  and  EKF  formulations. 

A.  Missile-Target  Model 

The  linear  dynamic  system  model  for  the  missile  and 
target  is  described  by  the  matrix-vector  differential 
equation 

*<M«Fs(M  +  uU) +  •<<).  (U 

where  x(f)  *  (9  X  1)  missile-target  state  vector  (the 
state  variables  are  the  Cartesian  coordi¬ 
nates  of  the  relative  target-to-missile 
position  and  velocity  and  the  target  ac¬ 
celeration); 

F  =  (9  X  9)  missile-target  state  matrix; 
u(f )  *  (9  X  1)  missile  acceleration  control  vector; 
and 

w(f )  =  (9  X  1)  missile  and  target  dynamic  dis¬ 
turbance  vector;  w(t)  is  modeled  as  a 
zero-mean  Gaussian  white-noise  uncor¬ 
related  vector  with  covariance  matrix  Q, 
i.e.,  w(f )  —  N(0,Q]. 

B.  Target  Measurement  Model 

We  assume  that  a  passive  sensing  system  estimates 
the  elevation  (0)  and  azimuth  W)  of  the  target.14  These 
polar  coordinates  are  related  to  the  relative  Cartesian 
position  coordinates  by  a  nonlinear  transformation. 
We  denote  the  relationship  between  the  measured 
quantities  and  the  relative  spatial  coordinates  by  the 
elements  h  1  and  h  j  of  a  vector  h.  The  target  measure¬ 
ment  model  is  thus  described  by  the  nonlinear  vector 
algebraic  equation 


z(f)  “  h[x(/>|  +  w.U) 

We  model  the  sensor  noise  vector  w„(f)  by  a  zero- 
mean  Gaussian  white-noise  vector  with  covariance 
matrix  R.  In  Eq.  (2)  we  note  that  z(t )  is  the  measure¬ 
ment;  h[x(f )]  is  nonlinearly  related  to  the  state  vector 
x  (since  Cartesian  coordinates  rather  than  polar  coor¬ 
dinates  are  used).  The  angles  0  and  0  are  the  directly 
measurable  quantities  (because  of  the  sensors  used  and 
the  techniques  available).  We  chose  Cartesian  coor¬ 
dinates  since  the  target-missile  model  is  linear  and  its 
propagation  is  easier  to  realize.  Since  the  measurement 
model  in  Eq.  (2)  is  nonlinear,  we  linearize  the  nonlinear 
function  h[x)  by  the  matrix-vector  product  H(f)xU ). 
where  H  is  the  gradient  of  h.  This  approximation  leads 
to  the  EKF  (Sec.  III). 

C.  Continuous-Time  Extended  Kalman  Filter 

The  objective  of  the  extended  Kalman  filter  (EKF) 
is  to  produce  an  estimate  A(f)  of  x(f )  for  each  0  and  V 
measurement  z(f ).  We  linearize  h(x),  about  the  most 
recent  estimate  &(t)  of  the  relative  spatial  coordinates, 
and  approximate  h[x(f )]  by  the  matrix-vector  product 
H(f  )x(f),  where  the  gradient  matrix  is 

H  (0*^1 
dxU) 

We  thus  realize  an  EKF  from  the  conventional  KF. 
The  three  steps  and  three  equations  which  define  an 
EKF  follow15: 

*(/>  «  F*U>  +  Ku >|z(r )  -  h|x(f>|l  +  uU).  (3) 

Ptf)  «  FPif)  +  P(MFr  +  Q(M  -  P(r)HT(f  )H(f  )P(f), 

(4) 

Ki»>«PinHTum-lu>.  is* 

The  state-estimate  equation  in  (3)  propagates  the 
estimate  ±(f )  of  the  state  vector.  From  Eq.  (3)  we  see 
that  the  next  state  estimate  is  the  weighted  sum  of  the 
output  of  the  process  or  system  model,  plus  the  inno¬ 
vations  process  (the  difference  between  the  measure¬ 
ment  z  and  the  nonlinear  transformation  h  of  the  state 
vector),  plus  the  control  vector  u.  The  second  term  in 
Eq.  (3)  is  an  estimate  of  the  measurement  noise.  If  the 
system  noise  increases,  the  present  measurement  is 
weighted  more  heavily  (i.e.,  the  KF  gain  K  is  large).  We 
note  that  the  feedback  of  the  current  state  estimate 
(including  the  KF  gain)  is  included  via  the  control 
vector  and  the  iteration  implied  in  Eq.  (3).  Thus,  the 
system  model  is  updated  with  the  current  state-estimate 
information  from  the  KF. 

The  error  covariance  matrix  P,  which  is  a  measure  of 
the  uncertainty  of  the  &  estimate,  is  defined  by 

Pul  »  £||i(()  -  x(f  )][x<( )  -  x(f  »]rl.  (6) 

Propagation  of  P  is  the  major  aspect  of  the  KF  and  is 
defined  by  Eq.  (6).  The  KF  gain  K  is  defined  by  the 
matrix  product  in  Eq.  (5).  From  Eqs.  (4)— (6),  we  see 
that  P  increases  if  the  system  noise  (Q)  increases  (this 
is  logical).  Thus,  if  Q  increases,  K  increases  and  hence 
we  weight  the  present  measurement  more.  If  the 
measurement  noise  R  increases,  K  decreases,  and  we 
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Fig.  1.  Schematic  diagram  of  a  frequency -multiplexed  acoustooptic 
systolic  array  processor.1' 


weight  the  system  more  heavily.  Thus,  the  next  esti¬ 
mate  relies  on  the  last  estimate  (if  R  is  large)  and  relies 
on  the  present  measurement  z  (if  R  is  small). 

The  operations  required  in  the  EKF  thus  include: 
calculation  of  the  KF  gain  in  Eq.  (5),  propagation  of  the 
error  covariance  matrix  P(t)  according  to  Eq.  (4),  and 
propagation  of  the  state  estimate  i(t )  according  to  Eq. 
(3). 

III.  Discrete-Time  Extended  Kalman  filter 

In  this  section  we  develop  our  new  discrete-time  EKF 
algorithm.  The  need  for  a  discrete-time  EKF  arises 
because  of  the  systolic  processor  realization  (which  re¬ 
quires  pulsed  data).  Any  time-sampled  processor 
(digital  or  optical)  requires  a  discrete-time  formulation. 
We  used  the  forward  Euler  method  to  discretize  the 
state  and  update  equations  in  (1)  and  (3)  and  the  trap¬ 
ezoidal  rule  to  discretize  the  error  covariance  matrix 
propagation  equation  in  (4).  These  algorithms  de¬ 
couple  the  update  equations.  The  resultant  discrete- 
time  EKF  algorithm  becomes 

Ki-PtHfR*'1  (7) 

**♦,  -  11  +  TFIi*  +  TK*|z*  -  h(S*)|  +  Xu*.  (8) 
P»+iM**iP*+1  +  |P*+iLr+  LP*+il  +  C*  -  0.  (9) 

Our  discrete-time  EKF  algorithm  is  thus  novel  and 
differs  from  prior  applications  of  discretization 
schemes16  (e.g.,  explicit  algorithms  and  Runge-Kutta 
methods)  to  Kalman  filtering.  In  all  our  equations,  the 
subscript  k  refers  to  the  time-index  or  the  KF  iteration 
count.  Equation  (7)  defines  the  KF  gain,  Eq.  (8) 
characterizes  the  state-estimate  update,  and  Eq.  (9) 
describes  the  error  covariance  update.  In  Eq.  (8),  T  is 
the  measurement  sampling  interval.  The  matrices 
which  we  introduced  in  Eq.  (9)  are 

M*+i  m  H*+i. 

L-Ki/ni-F),  no) 

C*  -  P»M»P  -  P*|(l/X)l  +  F]7"  -  |(1/X)I  +  F]P*  -  2Q. 

Equations  (7)— (10)  outline  the  steps  and  identify  the 
linear  algebraic  operations  (matrix-vector,  matrix- 
matrix,  and  matrix-matrix-matrix  multiplication) 
which  are  required  in  each  iteration  to  realize  our  dis¬ 
crete-time  EKF.  The  solution  of  the  symmetric  qua¬ 
dratic  matrix  equation  in  (9)  is  the  major  computational 
operation  required.  In  Sec.  V  we  detail  our  new  solu¬ 
tion  to  the  symmetric  quadratic  matrix  equation  in  (9) 
for  this  application. 


IV.  Optical  Linear  Algebra  Systolic  Array  Processors 

Numerous  optical  systolic  array  processors  have  been 
described  and  some  have  been  analyzed.1  The  ac¬ 
oustooptic  (AO)  optical  systolic  array  processor  we 
chose  to  detail  for  this  EKF  application  is  the  fre¬ 
quency-multiplexed  system  shown  schematically  in  Fig. 
l.:1  The  system  consists  of  a  linear  input  array  of  point 
modulators,  each  imaged  through  a  different  region  of 
an  AO  cell,  and  the  Fourier  transform  (FT)  of  the  light 
leaving  the  AO  cel)  formed  on  the  output  linear  detector 
array.  This  specific  system  was  chosen  because  it  is  t  he 
most  documented  and  analyzed  one  and  because  its 
flexibility  leads  to  the  realization  of  a  spectrum  of  linear 
algebraic  operations  by  format  control  of  the  input  data. 
In  this  section  we  summarize  the  operations  heretofore 
documented  that  are  required  for  our  present  EKF 
application,  and  we  detail  how  each  is  realized  on  this 
system. 

The  inputs  to  the  point  modulators,  light  emitting 
diodes  (LEDs)  or  laser  diodes  (LDs),  are  space  (x )  and 
time  (f ),  while  the  inputs  to  the  AO  cell  are  time  (/ )  and 
frequency  (/).  The  time  variable  is  converted  to  space 
as  the  contents  of  the  AO  cell  travel  across  the  aperture 
in  time.  We  achieve  the  matrix-vector  (MV)  multi¬ 
plication  Ab  =  c  on  this  system  as 


jr.r 


point  detector 


AO  cell  modulators  outputs 

To  see  how  the  operations  described  in  Eq.  ( 1 1 )  occur, 
we  define  the  bit  time  as  the  time  it  takes  data  in  the 
AO  cell  to  move  from  the  region  illuminated  by  the 
input  point  modulator  N  to  the  region  illuminated  by 
the  point  modulator  N  +  1.  We  consider  a  (3  X  3) 
matrix  example  as  in  Eq.  (11).  At  time  IT  a,  we  load  the 
first  column  of  the  matrix  into  the  AO  cell  with  each 
element  present  on  a  different  frequency  (/).  At  times 
2Tb  and  37#,  we  load  the  second  and  third  columns  of 
the  matrix  into  the  AO  cell.  At  3 Tb,  the  full  matrix  is 
present  in  the  AO  cell,  with  its  columns  opposite  point 
modulators  3, 4,  and  5,  respectively.  At  this  point,  the 
elements  of  the  vector  b  are  fed  in  parallel  to  the  point 
modulators  3-5.  Each  element  of  b  multiplies  the 
corresponding  columns  of  the  matrix  A.  The  output 
FT  sums  the  proper  elements  of  the  product  in  each 
row;  and  on  the  output  detectors,  the  MV  product  Ab 
appears  in  parallel.  For  an  (N  X  N)  element  matrix, 
the  MV  product  appears  in  parallel  in  zero  time  (after 
a  load  time  NTb,  during  which  the  matrix  is  loaded  into 
the  AO  cell). 

Matrix-matrix  (MM)  multiplication  for  a  (3  X  3) 
example  is  realized  as 


point  detector 


AO  cell  modulators  outputs 
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This  MM  multiplication  is  a  direct  extension  of  the  MV 
product  in  Eq.  (11),  repeated  N  times.  At  each  suc¬ 
cessive  Th  of  time  (after  the  load  time),  N  vector  inner 
products  of  N -element  vectors  are  formed  in  parallel  on 
the  N  output  detectors.  A  MM  product  thus  requires 
NTh  of  time  (plus  NT*  of  time  to  load  the  matrix  into 
the  AO  cell).  One  row  of  the  MM  product  emerges  in 
parallel  on  the  output  detectors  every  Th  time  in¬ 
terval. 

Matrix-matrix-matrix  (MMM)  multiplication  is  a 
further  extension  of  the  MM  multiplication  in  Eq.  (12). 
We  realize  ABD  =  E  by  forming  AB  =  C,  feeding  C  to 
the  AO  cell  as  it  is  formed  (one  row  at  a  time),  and  then 
producing  CD  =  E  one  row  at  a  time  in  parallel.  As  we 
have  shown,3  operations  and  data  flow  ideally  in  this 
architecture  (i.e.,  as  one  row  of  the  matrix  is  produced, 
one  Th  of  the  cell  becomes  vacant,  and  we  immediately 
feed  the  row  of  the  MM  product  produced  to  the  vacant 
slot  in  the  AO  cell). 

Iterative  MV  algorithms  for  the  solution  of  linear 
algebraic  equations  can  also  be  realized  on  this  system. 
To  solve  Ab  =  c,  we  feed  one  iterate  b*  of  b  to  the  point 
modulators  and  A  to  the  AO  cell.  We  form  Ab*  at  the 
output,  subtract  c,  add  b*  to  the  result,  and  feed  this 
sum  back  to  the  point  modulators  as  the  next  b*+i  it¬ 
erative  input.  To  solve  Ab  =  c  for  b,  we  thus  realize  the 
Richardson  algorithm,3-617 

b»+i  *=  u)(Ab*  —  c)  +  b*.  (13) 

where  w  is  the  acceleration  parameter  chosen  for  sta¬ 
bility.6  When  b*  a  b*+ 1,  Eq.  (13)  produces  the  solution 
to  Ab  =  c;  i.e., 

b  =  A'‘c.  (14) 

The  operations  in  Eqs.  (ID— (13)  realize  all  the  steps 
required  to  implement  our  discrete-time  EKF  with  the 
exception  of  the  solution  of  the  symmetric  quadratic 
matrix  equation  in  (9).  Our  prior  approaches  to  solving 
a  quadratic  matrix  equation  used  two  iterative  loops  to 
implement  the  Kleinman  and  Richardson  algorithms, 
respectively.7  In  Sec.  V  we  introduce  a  new  optical 
solution  to  the  symmetric  quadratic  matrix  problem  and 
detail  its  implementation.  This  novel  algorithm  has 
many  advantageous  features  compared  with  those 
which  we  previously  reported." 

V.  Optical  Systolic  Algorithms 

In  this  section,  we  apply  the  Newton-Raphson  algo- 
roithm  to  introduce  an  optical  systolic  solution  to  the 
symmetric  quadratic  matrix  equation  in  (9)  and  high¬ 
light  the  efficient  calculation  of  the  Jacobian  matrix. 

A.  Optical  Systolic  Solution  to  a  Quadratic  Matrix 
Equation 

The  optical  solution  we  introduce  to  solve  the  sym¬ 
metric  quadratic  matrix  equation 

G*  *  +  +  iPn*  |L  '  +  LPi  +  il  +  CV  —  0  (15) 

for  P*+ 1  is  the  Newton-Raphson'* algorithm.  We  write 
Eq.  (15)  as  the  lexographicallv  ordered  vector  g(pi  . .  . 
p*i)  =  0,  where  p„  contains  the  A’-’  =  8]  elements  of  P„. 
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Fig.  2.  TripJe-nested  discrete-time  EKF  algorithm. 


We  thus  convert  Eq.  (15)  into  a  system  of  N-  =  81  si¬ 
multaneous  quadratic  equations  (for  a  9-state  prob¬ 
lem).  The  elements  (pi . . .  p«i)  of  P*+i  are  the  desired 
solution.  The  Newton-Raphson  algorithm  to  solve  Eq. 
(15)  is16 


P--H  “  P„  -  J|Pn]_lglPn).  (16) 

where  the  Jacobian  matrix  J[p„]  is  defined  by 

JUJ)  -  dg,/dPj\p„  for  ij  =  1 . 81.  (17) 


The  Jacobian  in  Eq.  (17)  is  thus  an  (N-  x  N-)  =  (81 
X  81)  matrix.  We  chose  this  algorithm  because  it  is 
quadratically  convergent  and  is  a  single-step  algorithm 
(i.e.,  J[pn]  is  computed  from  the  nth  iterate  p„  alone 
and  not  from  prior  iterates  of  p). 

The  four  steps  in  each  iteration  of  the  algorithm  in 
Eq.  (16)  and  the  operations  required  to  implement  it 
are: 

(i)  Calculate  the  constituent  matrices  in  Eq.  (15)  and 
form  the  N 2  vector  g[p„]. 

This  step  requires  matrix-matrix  and  matrix-ma¬ 
trix-matrix  multiplication. 

(ii)  Calculate  the  N-  *  81  elements  of  the  Jacobian 
matrix  J(iJ)  in  Eq.  (17).  In  Sec.  V.B  we  describe  a 
novel  optical  or  digital  approach  for  this  step. 

(iii)  Modify  Eq.  (16)  and  solve  Eq.  (15)  in  the 
form. 


for  s„ ,  where 


J(p„>»„  ■  -«|p„l 


)  18) 


=  P„*i  -  P„ 
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To  solve  the  system  of  linear  algebraic  equations  in  (18), 
we  apply  the  iterative  Richardson  algorithm17  in  Eq. 
(13)  in  the  form 

*,*i  =  %,  +  w|gtp„l  +  J(p„  >e,|.  (201 

(iv)  Compute  p„+ 1  from  Eq.  (19). 

This  formulation  is  attractive  since  it  circumvents  the 
need  to  invert  the  Jacobian  matrix  in  Eq.  (16), 

We  illustrate  our  discrete-time  EKF  algorithm  in  the 
block  diagram  of  Fig.  2.  Our  algorithm  thus  incorpo¬ 
rates  three  nested  iterative  loops.  In  the  innermost  loop 
(index  r),  we  apply  the  Richardson  algorithm  in  Eq.  (20) 
to  solve  Eq.  (18)  for  sr+).  The  iterations  in  Eq.  (20) 
continue  until  sr+I  =  sr.  When  the  iterative  Rich¬ 
ardson  algorithm  converges,  we  set  sr+ 1  =  s„  and  form 
p„+ 1  from  Eq.  (19).  If  |s„  |  >  c,  we  begin  a  new  New- 
ton-Raphson  loop  (index  n),  calculate  g(p„]  in  Eq.  (15) 
and  J|p„  ]  in  Eq.  (17).  W'e  then  repeat  the  Richardson 
iterative  algorithm  in  Eq.  (20)  for  the  new  g[p„  ]  vector 
and  Jacobian  J[p„]  matrix.  This  two-loop  Newton- 
Raphson/Richardson  iterative  procedure  is  repeated 
until  |.s„  |  <  r,  at  which  point  we  set  P„+i  =  P*+i-  W’e 
now  return  to  the  KF  loop  (index  k )  and  calculate  a  new 
K*  and  i*+i-  The  next  measurement  z *  can  then  be 
accepted.  In  Sec.  VI  we  detail  our  full  system  archi¬ 
tecture  and  quantify  the  processing  time  for  our 
EKF. 

B.  Efficient  Calculation  of  the  Jacobian 

The  calculation  of  the  Jacobian  matrix  is  a  crucial 
step  in  a  Newton -Raphson  solution.  We  have  devel¬ 
oped  an  efficient  technique  for  J  calculation  using  the 
table  look-up  method  of  Blackburn19  (developed  for  the 
solution  of  the  algebraic  Riccati  equation)  and  modified 
the  table  look-up  method  for  an  optical  or  digital  sys¬ 
tolic  parallel  processor  and  its  application  in  our  dis¬ 
crete-time  EKF  algorithm.  We  rewrite  the  Jacobian 
matrix  in  Eq.  (17)  as 

J  «  Ar®  I  +  I  ®  A7,  (211 

where 

A«M,*,P„*,  +  L.  (221 

and  ®  denotes  the  Kronecker  product;  i.e., 

M  ®  P  =  (m,.Pi,,  (2:0 

Equations  (21)  and  (23)  illustrate  that  the  Kronecker 
product  reorders  the  elements  of  A.  Thus,  calculation 
of  J  can  be  simply  achieved  by  addressing  the  proper 
elements  of  A. 


For  an  N-state  problem,  A  in  Eq.  (22)  is  (N  X  N )  and 
the  full-order  Jacobian  J  in  Eq.  (21)  is  (N-  x  N-). 
Because  the  matrices  M,  P,  C,  and  G  and  the  quadratic 
matrix  equation  in  (15)  are  symmetric,  there  are  .V ( .V 
+  l)/2  =  45  unknown  elements  in  P  (rather  than  N-  = 
81 ).  Thus,  we  can  reduce  the  size  of  the  Jacobian  from 
(81  X  81 )  to  (45  X  45).  This  simplifies  the  calculation 
of  J  and  reduces  the  number  of  elements  of  A  that  must 
be  addressed.  The  number  of  states  or  dimension  (N 
~  9)  of  the  problem  and  the  symmetry  of  the  matrices 
determine  which  elements  of  A  form  which  elements  of 
J.  Since  N  is  fixed,  the  same  elements  of  A  are  ad¬ 
dressed  in  each  Newton-Raphson  iteration.  Although 
P„  and  hence  A„  change  with  the  index  n,  the  elements 
addressed  remain  the  same.  Thus,  we  need  only  form 
a  new  A„  matrix  and  use  the  same  processor  to  calculate 
the  new  Jn  matrix  from  this  An  matrix.  Since  s„  is  a 
combination  of  p„  (the  lexographically  ordered  ele¬ 
ments  of  P„),  S  is  symmetric  (i.e.,  stJ  =  sj; ).  By- 
applying  this  property  and  the  fact  that  J  multiplies  s„ 
in  Eq.  (18),  we  can  delete  and  combine  redundant  rows 
of  J  and  the  corresponding  elements  of  s„.  We  find 
that  only  405  elements  (or  one-fifth  of  the  2025  elements 
of  the  reduced-order  J)  are  nonzero  and  must  be  cal¬ 
culated.  The  system  of  Fig.  3  can  compute  J  using  this 
table  look-up  technique  and  the  aforementioned  algo¬ 
rithm.  In  this  system,  the  matrix  A  is  fed  to  the  AO  cell. 
At  successive  instants  of  time,  the  proper  point  modu¬ 
lators  are  pulsed-on.  This  accesses  the  correct  elements 
of  A.  By  varying  the  strength  of  the  point  modulators, 
different  weights  or  multiplictions  of  an  element  of  A 
can  be  achieved.  By  pulsing-on  two  point  modulators 
simultaneously,  the  sum  of  two  elements  of  A  can  be 
produced.  By  this  technique  and  architecture.  J  can 
easily  be  produced  (one  element  at  a  time).  The  point 
modulators  addressing  each  Tb  are  determined  from  a 
look-up  table  (and  this  table  is  fixed  for  an  Nth  order 
problem). 

VI.  Systolic  EKF  Architecture  and  Processing  Time 

For  our  AO  cell  we  assume  an  aperture  time  TA  =  35 
psec,  which  is  divided  into  100  time  slots:  i.e.,  Tn~  350 
nsec  (for  a  3-MHz  data  rate  per  channel).  Calculation 
of  J  in  Fig.  3  (which  incorporates  81  point  modulators) 
requires  T.j  =  20.7  nsec  using  this  system.  The  non¬ 
linear  functions  h[x]  and  [H]  require  evaluation  of  the 
arctangent  and  magnitude  and  are  thus  best  formed  in 
nonlinear  analog  modules.  The  calculation  times  for 
h  and  H  (using  conventional  available  off-the-shelf 
analog  modules)  are  t/,  =  20  psec  and  f //  =  30  >isec.  re- 
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Fig.  4.  Optica!  systolic  discrete-time  EKF  processor  architecture. 


spectively.  Faster  computation  of  all  these  parameters 
is  possible  with  a  different  AO  cell,  a  higher  data  rate 
per  channel  and  different  analog  modules.  The  per¬ 
formance  goal  (a  1-kHz  measurement  sample  rate)  can 
easily  be  achieved  with  these  present  components  and 
the  parameters  noted  above.  Consequently,  additional 
effort  was  not  directed  toward  improving  further  the 
speed  obtainable. 

In  the  design  of  the  optical  system  in  Fig.  1,  we  re¬ 
stricted  the  number  of  frequencies  to  be  a  maximum  of 
ten  (to  simplify  the  electronic  support  required)  and  we 
assumed  an  AO  ceil  with  a  time-bandwidth  product  of 
1000  maximum  (this  value  is  compatible  with  present 
off-the-shelf  components).  In  Table  I  we  list  the  linear 
algebraic  operations  which  are  required,  and  the  system 
design  parameters  and  performance  specifications  as¬ 
sociated  with  each  operation.  The  first  three  operations 


must  be  performed  on  (9  X  9)  matrices.  To  represent 
bipolar  data,  we  use  18  elements  (space  or  time-multi¬ 
plexed)  to  represent  9  bipolar  values.  The  number  of 
point  modulators  required  in  each  case  is  noted  (in  all 
cases  only  9  frequencies  are  used).  The  time  to  load  the 
data  into  the  AO  cell  of  the  system  and  the  calculation 
time  (once  the  data  are  loaded)  are  noted  separately  in 
units  of  the  bit  time  Tp  and  as  a  function  of  the  di¬ 
mension  N  of  the  matrix.  In  general,  the  load  time  does 
not  enter  into  the  full  processing  time,  since  the  oper¬ 
ations  can  be  pipelined  to  allow  new  data  to  be  loaded 
as  calculations  proceed.  The  symbols  used  for  each 
operation  are  noted  in  the  last  column  of  Table  I. 

In  Fig.  4  we  display  the  architecture  of  our  final  EKF 
design.  We  employ  two  of  the  optical  systolic  proces¬ 
sors  depicted  in  Fig.  1.  The  first  system  uses  35-input 
point  modulators  and  the  second  uses  91 -input  point 
modulators.  Each  system  employs  9  multiplexed 
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frequencies  within  the  AO  cell.  This  allows  optimal  use 
of  parallelism  in  the  operations  required  in  our  EKF 
algorithm. 

In  Table  II  we  compile  the  operations  and  load  and 
calculation  times  for  the  sequence  of  steps  (for  each 
time-sample  k )  in  our  optical  systolic  discrete-time  EKF 
processor  (Fig.  4).  The  asterisks  in  the  table  label  the 
critical  time  paths  in  our  processor. 

We  summed  the  load  and  calculation  times  delin¬ 
eated  in  Table  II  and  found  the  total  calculation  time 
for  one  time-sample  k  in  our  EKF  processor  to  be 

r*  *  42 Th  +  30*isec  +  n|18Tft  +  20  psec  +  r  455 Tfl | .  (24 ) 

where  n  and  r  denote,  respectively,  the  number  of 
Newton-Raphson  and  Richardson  algorithm  iterations 
required.  For  the  bit  time  7#  =  350  nsec,  the  compu¬ 
tation  time  is 

T„  «  44.7  +  n[26  +  159rl*isec.  (25) 

The  largest  time  is  spent  in  the  innermost  Richardson 
loop.  (The  solution  of  the  symmetric  quadratic  matrix 
equation  in  (15)  for  P*+i  consumes  95%  of  the  compu¬ 
tation  time.)  We  have  thus  optimized  our  system,  the 
data  flow,  and  the  data  format  to  reduce  the  computa¬ 
tional  time  for  this  specific  bipolar  MV  operation  (in 
Tables  I  and  II).  From  our  initial  simulations,  we  found 
n  -  2  or  3  and  r  =  2  to  be  adequate  to  achieve  conver¬ 
gence  of  our  algorithm.  On  substituting  these  values 
into  Eq.  (25),  we  find  the  processing  time  for  one  EKF 
sample  to  be  7*  =  0.73  msec  or  7*  =  1.08  msec.  These 
correspond  to  a  measurement  data  rate  of  1/T*  =  1.36 
kHz  or  0.926  kHz.  These  data  rates  are  adequate  for 
our  present  goal  of  ~l-msec  data  measurements.  As 
we  noted  at  the  outset  of  this  section,  faster  calculation 
times  are  possible  with  different  system  and  component 
parameters  choices. 

VII.  Summary  and  Conclusions 

In  this  paper  we  have  considered  a  specific  problem 
in  optical  linear  algebra,  the  realization  of  a  Kalman 
filter  (for  an  air-to-air  missile  engagement)  and  have 
developed  a  new  discrete-time  extended  Kalman  filter 
algorithm  for  this  application.  We  have  structured  the 
steps  in  this  algorithm  in  terms  of  basic  linear  algebra 
operations.  We  then  detailed  how  each  operation  could 
be  realized  on  an  optical  systolic  array  processor,  and 
the  pipelining  and  data  flow  between  all  steps.  A  spe¬ 
cific  frequency-multiplexed  optical  systolic  processor 
was  chosen  since  it  allows  the  flexibility  of  performing 
different  operations  with  different  data  encoding  and 
format  control  techniques.  A  new  triple-nested  algo¬ 
rithm  was  devised  to  solve  the  discrete-time  extended 
Kalman  filter  problem  for  this  application.  This  al¬ 
gorithm  also  involved  a  new  optical  technique  to  eval¬ 
uate  the  Jacobian  matrix.  The  performance  achieved 
in  this  design  was  found  to  be  adequate  for  our  intended 
measurement  sampling  rate.  This  represents  a  new 
application  of  systolic  array  processors  (optical  or  dig¬ 
ital).  This  paper  represents  an  important  case  study 
in  how  individual  operations  must  be  pipelined  and 
nested  to  solve  an  overall  system  problem. 
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Kalman  filtering  and  state  estimation  are  major  tech¬ 
niques  used  in  many  control  and  signal  processing  applica¬ 
tions.12  A  Kalman  filter  produces  the  optimum  least  mean 
square  or  maximum-likelihood  estimate  of  the  state  of  a  linear 
system  driven  by  additive  noise.  In  Kalman  filtering,  a  sys¬ 
tem  or  process  model  with  additive  noise  and  a  sensor  mea¬ 
surement  system  with  additive  noise  are  assumed.  The 
Kalman  filter  provides  estimates  of  the  state  of  the  system, 
the  accuracy  of  the  most  recent  estimate,  and  the  control  for 
the  system.  This  is  achieved,  assuming  that  the  system  and 
measurement  noise  are  known  (zero-mean  Gaussian  statistics 
are  assumed),  by  recursive  curve-fitting  to  estimate  the  state 
of  the  system.  The  next  state  estimate  is  a  linear  combination 
of  the  prior  control  and  the  prior  estimate  and  the  uncertainty 
measurement  of  the  sensor’s  noise.  Depending  upon  the 
amount  of  process  and  sensor  noise,  more  weight  (through  the 
Kalman  filter  gain  matrix)  is  given  to  the  present  estimate  or 
the  present  measurement. 

In  Ref.  3,  we  described  a  frequency-multiplexed  acous¬ 
tooptic  (AO)  processor  and  detailed  how  it  was  capable  of 
performing  all  the  individual  operations  (matrix-matrix- 
matrix  multiplication,  matrix  inversion,  etc.)  required  for 
Kalman  filtering.  However,  the  data  flow  and  organization 
of  all  required  operations  were  not  detailed.  In  this  Letter, 
we  consider  specifically  a  simpler  Kalman  filter  state  esti¬ 
mation  problem.  We  assume  that  the  measurement  vector 
Zk  is  received  serially  and  is  sampled  at  regular  intervals.  We 
now  consider  the  problem  of  calculating  the  state  estimation 
vector  ti,  and  the  extrapolated  state  estimation  vector  S*+i. 
assuming  that  the  system’s  noise  statistics  are  known.  The 
data  flow  is  found  to  be  quite  ideal  in  the  iterative  optical 
processor  we  devised  for  this  Kalman  filtering  state  estimation 
problem. 

In  Table  I,  we  list  the  discrete-time  Kalman  filter  equations. 
For  more  details  and  a  derivation  of  these  equations,  Refs.  1-3 
should  be  consulted.  We  assume  equally  spaced  time-sam¬ 


pled  intervals  kTs,  where  k  is  the  iterative  time  index.  We 
assume  that  the  system  noise  vector  w  and  the  measurement 
noise  vector  v  are  uncorrelated  and  Gaussian-distributed  and 
that  the  noise  statistics  (Q  and  R)  and  the  system  model 
(4>,I\H)  are  known.  Thus  the  error  covariance  matrix  P  and 
the  extrapolated  erroT  covariance  matrix  M  can  be  precom¬ 
puted,  and  hence  the  Kalman  gain  matrix  K*  can  be  pre¬ 
computed  and  stored  for  each  input  time  sample. 

With  these  assumptions,  we  now  consider  the  state  of  the 
filter  and  system  and  the  calculations  required  after  receipt 
of  a  new  measurement  sample  z*  at  time  kTs.  From  the 
previous  Kalman  filter  cycle,  we  have  an  extrapolated  state 
estimate  £*,  and  from  the  known  noise  statistics  we  have 
precomputed  the  Kalman  gain  matrices.  In  many  cases,  the 
noise  statistics  (and  the  matrices  describing  the  system  model) 
change  sufficiently  slowly  that  the  storage  requirements  and 
updating  requirements  for  K*  and  the  other  necessary  system 
and  noise  matrices  ( ♦* ,  T*  H* ,  Q* ,  and  R* )  are  not  excessive. 
When  a  new  measurement  sample  z *  is  obtained  at  time  k  Ts , 
the  new  state  vector  estimate  f  *  must  be  calculated  from  Eq. 
(lc),  and  then  the  new  extrapolated  state  estimate  vector  1 
must  be  evaluated  using  Eq.  (Id).  Thus,  with  known  noise 
statistics  and  a  known  system  model,  the  required  Kalman 
filter  state  estimate  calculations  required  for  each  new  input 
sample  are  simply  Eqs.  (lc)and  (Id)  of  Table  I.  Combining 
these  equations,  we  describe  the  simplified  Kalman  filter  and 
the  calculations  required  by 

£*+i  “  (♦*  -  +  r*w».  (2al 

£*+i  »  A*i*  +  B*z*  +  T*W*  (2bl 

Since  the  matrices  (♦*  -  $*K*H*)  and  4>*K*  can  be  pre¬ 
computed,  we  denote  the  associated  matrices  required  in  Eq. 
(2a)  by  A*  and  B*  as  noted  in  Eq.  (2b)  to  simplify  notation. 
If  we  had  included  the  assumption  of  zero-mean  system  noise 
(w*  *  0),  the  equation  would  simplify  even  further. 


Table  I.  Dtacreta-TIma  (k)  Kalman  FHMf  Equation* 


Description 

Defining  Equations 

System  model 

**+! 

-  **x»  +  r*w*  Hat 

z*  ■ 

H*x*  +  v*  (lb) 

State  estimate 

i.  » 

S»  +  K*(z*  -  H* xD  del 

Extrapolated  state  estimate 

5>+, 

*  ♦***  +  I\w»  ndi 
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Fig.  1.  Schematic  diagram  of  an  optical  systolic  processor  for  Kal¬ 
man  filter  slate  estimation. 

We  consider  the  use  of  optical  systolic  array  processors 
employing  acoustooptic  (AO)  transducers  to  perform  the 
necessary  computations  in  Eqs.  (2).  Only  single-channel 
optical  systolic  array  processor  architectures3'4  have  been 
described  thus  far  in  the  literature.  This  class  of  optical  linear 
algebra  processors  is  quite  general  purpose.  Attention, 
however,  must  be  given  to  the  flow  of  data  and  operations  in 
such  systems;  and  for  high  computational  efficiency  one  must 
avoid  analog-to-digital  conversion  and  the  storage  of  inter¬ 
mediate  data  results.  To  perform  efficiently  the  calculations 
necessary  in  Eqs.  (2),  we  found  that  a  multichannel  optical 
systolic  array  processor  yielded  the  best  results.  One  reali¬ 
zation  with  such  a  system  is  shown  in  Fig.  1. 

This  architecture  is  a  new  two-channel  iterative  optical 
systolic  array  processor.  In  the  system  of  Fig.  1,  two  linear 
LED  arrays  are  imaged  onto  two  separate  channels  of  an  AO 
cell.  This  forms  the  separate  product  of  the  corresponding 
input  LED  data  and  the  contents  of  the  AO  cell.  Since  all  the 
data  in  the  AO  cell  are  present  at  the  same  frequency,  all  the 
light  distribution  leaving  both  channels  of  the  AO  cell  will  be 
deflected  in  the  same  direction  and  will  thus  be  focused  by  lens 
L  i  at  the  same  horizontal  location  in  the  output  plane.  Lens 
L\  also  vertically  integrates  and  focuses  the  light  leaving  both 
channels  of  the  AO  cell.  The  size  of  the  detector  is  chosen  to 
collect  all  this  light.  Thus  the  summation  of  the  total  light 
distribution  leaving  both  channels  of  the  system  is  formed  on 
the  single-output  photodetector  (DET).  The  upper  AO  cell 
channel  is  fed  with  the  measurement  vector  z* ,  and  the  lower 
AO  cell  channel  is  fed  with  the  prior  state  estimate  x*.  The 
upper  LED  array  is  fed  with  one  row  of  the  matrix  A*  in  par¬ 
allel,  and  the  lower  LED  array  is  similarly  fed  sequentially 
with  the  rows  of  B*.  Subsequent  rows  of  A  *  and  B*  are  en¬ 
tered  every  T,  in  parallel.  For  an  NxN  matrix,  the  leftmost 
N  LEDs  ( 1  to  N)  are  addressed  with  the  first  row  of  A*  and 
B*  at  time  step  T,  At  27',,  LEDs  2  to  N  +  1  are  addressed 
with  the  next  row  of  A*  and  B*.  Since  the  data  in  the  AO  cell 
move  horizontally  by  a  time  step  T »,  it  is  necessary  to  stagger 
the  LEDS  being  addressed  at  each  kT,  time  as  detailed  in  Ref. 
3. 

At  each  T,  time  step,  the  output  from  the  photodetector  will 
be  one  element  of  the  matrix-vector  product  and  vector 
summation  given  by  the  first  two  terms  in  Eq.  (2a)  or  (2b). 
After  NT, ,  the  vector  data  have  reached  the  end  of  the  AO  cell, 
and  the  entire  matrix-vector  product  has  been  produced  as 
the  time-history  output  from  the  detector.  As  each  element 
of  the  output  vector  is  produced,  the  corresponding  element 
of  T*w*  (the  last  term  in  Eq.  (2)|  is  added  toil  (using  a  simple 
resistor  adder)  to  produce  one  element  of  the  new  state  update 
vector  x*+  j.  As  this  vector  is  produced,  it  is  fed  back  directly 
into  the  lower  AO  cell.  The  aperture  time  of  the  AO  cell  is 


chosen  to  be  (2N  —  \)Tf,  and  (2.V  —  1)  LEDS  are  used.  Thus, 
after  NT,,  a  new  extrapolated  state  estimate  vector  x«.  ♦  i  has 
been  produced  and  loaded  into  the  cell  (together  with  the  new 
measurement  vector  z*  +  l).  The  above  iterative  cycle  can 
then  be  immediately  repeated  on  the  new  sampled  data.  As 
seen,  data  flow  in  such  a  system  is  ideal.  (As  soon  as  an  output 
is  produced,  it  is  loaded  directly  into  the  newly  vacant  slot  at 
the  transducer  end  of  the  AO  cell.)  The  time  history  of  the 
output  from  the  adder  is  the  new  extrapolated  state  estimate, 
which  can  then  be  used  for  various  control  applications  and 
other  on-line  adaptive  processing  functions  depending  upon 
the  application.  Many  variations  of  this  basic  architecture 
are  possible,  such  as  the  use  of  a  linear  CCD  shift  register 
detector  readout  system  as  in  Ref.  4,  frequency-multiplexing 
of  the  LED  or  AO  cell  data  as  in  Ref.  1 .  These  different  sys¬ 
tems  may  be  preferable  for  specific  applications  such  as  when 
the  number  of  states  is  large,  but  the  input  data  sampling  rate 
is  slow.  The  system  of  Fig.  1  appears  to  be  the  best  general 
solution  at  present. 

We  now  briefly  consider  the  extension  of  this  system  to 
allow  it  to  operate  on  bipolar-valued  matrix  and  vector  data. 
Many  possibilities  exist.  The  one  we  have  found  to  be  most 
attractive  is  a  direct  extension  of  the  system  of  Fig.  1.  We 
frequency-multiplex  the  inputs  to  the  AO  cell  (and  thus  use 
both  its  bandwidth  and  delay  time).  For  matrices  and  vectors 
with  bipolar  values,  we  enter  the  positive  x*  and  negative  xj 
parts  of  the  vector  x*  =  x*  —  x*  into  the  lower  channel  of  the 
AO  cell  in  parallel  on  two  separate  frequencies.  We  separate 
the  positive  and  negative  parts  of  the  input  matrices  and  time 
multiplex  the  LED  outputs  (first  pulsing  them  on  with  the 
positive-valued  matrix  data  and  then  with  the  negative-valued 
matrix  data).  A  similar  time  and  frequency  division  multi¬ 
plexing  is  used  for  the  measurement  data  and  the  upper  AO 
cell  channel.  With  the  LEDs  pulsed  at  twice  the  input  data 
rate  to  the  AO  cell,  the  system  thus  operates  properly  with  no 
reduction  in  the  input  data  rate  it  can  handle.  At  the  output, 
we  determine  the  magnitude  and  sign  of  x*+i  and  appro¬ 
priately  feed  this  data  back  to  the  AO  cell  iteratively  as  before. 
The  sign  of  X*+i  is  determined  from  the  time  slot  in  the  de¬ 
tector  output  with  a  nonzero  value.  From  the  sign  of  x*+  j. 
we  select  the  multiplexed  frequency  input  to  the  AO  cell  to 
be  used.  The  data  in  such  an  architecture  still  pipeline  ideally 
from  the  output  detector  back  to  the  AO  cell.  Extensions  of 
this  technique  to  handling  complex-valued  data  by  the  use  of 
three  frequencies  to  encode  complex  data  by  their  projections 
on  the  0, 120,  and  240°  axes  in  complex  space6  also  follow  di¬ 
rectly. 

These  techniques  for  handling  bipolar  data  increased  the 
required  LED  source  modulation  rate  by  a  factor  of  2  above 
the  input  data  sampling  rate.  (However,  the  number  of  LEDs 
required  is  not  increased.)  More  important,  the  bipolar  data 
handling  technique  requires  a  quadruple  increase  in  the  AO 
cell's  bandwidth  and  in  its  time-bandwidth  product  (a  factor 
of  2  due  to  the  two  frequencies  used  and  a  factor  of  2  due  to 
the  doubling  of  the  LED  modulation  rate).  Despite  these 
disadvantages,  this  technique  is  more  appropriate  than  the 
use  of  various  biasing  methods  or  the  use  of  two  cycles  to 
process  bipolar  data  (one  cycle  for  positive  data  and  one  for 
negative  data)  because  of  the  complicated  detector  postpro¬ 
cessing  that  results  and/or  the  intermediate  data  storage  re¬ 
quired.  Thus  the  reduced  amount  of  data  shuffling  and 
postprocessing  that  results  with  the  technique  described  above 
seems  to  make  it  preferable.  If  the  dimension  of  the  ma¬ 
trix-vector  problem  becomes  too  large  (i.e.,  if  the  entire  vector 
will  not  fit  into  V?  of  the  AO  cell  at  one  time),  matrix  parti¬ 
tioning  techniques  on  several  simple  system  modifications  are 
required.  These  are  simple  conceptually  and  require  using 
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more  than  two  AO  cell  channels  plus  time-division  multi¬ 
plexing  of  the  LED  inputs  and  the  single  detector  output.  We 
will  detail  such  issues  in  a  later  publication. 

The  use  of  multichannel  AO  cells  (together  with  proper 
time-division  multiplexing  of  the  inputs  and  outputs  of  the 
system)  represents  a  major  extension  to  this  class  of  optical 
systolic  array  processor.  Their  applications  to  Kalman  fil¬ 
tering,  state  estimation,  and  handling  bipolar  and  complex¬ 
valued  data  appear  to  be  quite  significant. 
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ABSTRACT 

An  iterative  algorithm  for  the  solution  of  a  quadratic  matrix  equation  (the  algebraic 
Ricatti  equation)  is  detailed.  This  algorithm  is  unique  in  that  it  allows  the  solution  of 
a  nonlinear  matrix  equation  in  a  finite  number  of  iterations  to  a  desired  accuracy.  Theo¬ 
retical  rules  for  selection  of  the  operation  parameters  and  number  of  iterations  required  are 
advanced  and  simulation  verification  and  quantitative  performance  on  an  error-free  processor 
are  provided.  An  error  source  model  for  an  optical  linear  algebra  processor  is  then  ad¬ 
vanced,  analyzed  and  simulated  to  verify  and  quantify  our  performance  guidelines.  A  com¬ 
parison  of  iterative  and  direct  solutions  of  linear  algebraic  equations  is  then  provided. 
Experimental  demonstrations  on  a  laboratory  optical  linear  algebra  processor  are  included 
for  final  confirmation.  Our  theoretical  results,  error  source  treatment  and  guidelines  are 
appropriate  for  digital  systolic  processor  implementation  and  for  digital-optical  processor 
analysis . 

1 .  INTRODUCTION 

Optical  linear  algebra  processors  (OLAPs)  represent  a  most  general  and  attractive  use  of 
the  parallelism  and  real-time  processing  features  of  optical  systems  [1].  The  frequency- 
multiplexed  acousto-optic  (AO)  processor  [2,3]  of  Figure  1  represents  a  most  general-purpose 
OLAP  architecture  with  ease  of  fabrication  [4]  and  competitive  computational  rates  [2,4]. 

In  this  architecture  (Figure  1),  N  point  modulator  inputs  are  imaged  through  N  separate 
regions  of  an  AO  cell.  These  individual  regions  are  separated  by  TB  of  time  (for  propagation 
of  the  acoustic  wave)  and  by  a  physical  distance  dg.  In  [2],  the  use  of  this  processor  in 
iterative  algorithms,  direct  LU  and  QR  matrix  decomposition  algorithms,  and  triangular  sys¬ 
tem  solutions  was  detailed. 


FIGURE  1 

Simplified  schematic  of  a  frequency-multiplexed  optical  linear 
algebra  processor  [3] 


In  this  paper,  we  consider  the  use  of  this  processor  for  the  solution  of  a  nonlinear  matrix 
equation  (Section  2) .  The  specific  application  chosen  is  the  solution  of  the  algebraic 
Ricatti  equation  (ARE) .  This  nonlinear  equation  is  similar  to  the  expressions  to  be  solved 
in  Kalman  filtering  and  other  advanced  modern  signal  processing  algorithms.  An  iterative 
solution  is  necessary  for  such  problems  and  for  eigensystem  solutions.  Our  proposed  non¬ 
linear  ARE  solution  is  quite  unique  since  it  requires  a  finite  number  of  steps  to  achieve 
a  specific  accuracy  and  performance.  In  Section  3,  we  summarize  selection  of  the  operation¬ 
al  parameters  for  such  an  iterative  algorithm  and  the  theoretical  basis  for  our  choice  of 


the  fixed  number  of  iterations  to  be  used.  Section  4  presents  initial  error-free  simulation 
data.  In  Section  5,  we  advance  our  error  source  model.  In  Section  6,  we  review  our  itera¬ 
tive  and  direct  solutions  to  systems  of  linear  algebraic  equations  (LAEs) .  This  represents 
the  fundamental  operation  required  in  advanced  linear  algebra  algorithms.  Section  7  con¬ 
tains  simulation  data  to  quantify  the  dominant  error  sources  and  the  accuracy  expected  from 
such  algorithms.  We  conclude  in  Section  8  with  the  experimental  verification  and  quantifi¬ 
cation  of  our  theoretical  results.  Our  summary  and  conclusions  are  then  advanced  in  Section 
9. 

2.  NONLINEAR  MATRIX  SOLUTION 

In  reference  [5],  we  detailed  a  solution  to  the  linear  quadratic  regulator  control  problem  to 
minimize  a  quadratic  performance  index  for  a  linear  system.  Computation  of  the  regulator 
feedback  gain  matrix  K  that  defines  the  optimal  controls  u  involves  the  solution  of  the  ARE 

iE  +  ZTS-SLS  +  Q*=0  (1) 

for  S.  To  achieve  this,  we  used  the  Kleinman  algorithm  [5]  and  the  solution  of  the  vector¬ 
ized  Lyapanov  equation  to  format  the  solution  of  (1)  as  a  solution  of  the  set  of  LAEs 

H(k) s(k)  «  y(k) ,  (2) 

where  s  and  y  are  the  vectorizations  of  S  and  SLS-Q  respectively  and  H  is  a  Kronecker  for¬ 
matted  matrix.  This  system  of  LAEs  must~be  solved  successively  with  different  matrices  H 
and  vectors  y  with  the  results  of  one  cycle  used  to  compute  the  matrix  H  and  vector  y  for 
the  next  cycle.  To  achieve  this,  we  employ  a  two-loop  iterative  algorithm  described  by 

s  (r+1  ,k)  -  (I  -  w(k)H(k)  1  s  (r,k)  +  uj(k)y(k).  (3) 

In  solving  (2)  using  (3),  we  solve  (2)  for  one  outer  loop  iteration  k,  update  H  and  y  and 
solve  the  next  LAE.  This  procedure  continues  until  s  is  of  sufficient  accuracy.  The  algo¬ 
rithm  in  (3)  implies  an  iterative  solution  for  each  LAE.  Direct  solutions  are  also  possible 
as  we  discuss  in  Sections  6  and  7.  The  indices  r  and  k  in  (3)  refer  to  Richardson  (inner) 
and  Kleinman  (outer)  loop  iterations  respectively. 

3.  OPERATIONAL  PARAMETER  SELECTION 

In  an  iterative  algorithm  such  as  (3),  various  operational  parameters  must  be  selected. 
The  initial  selection  s(0,0)  for  S  and  the  choice  8(0, k)  for  each  LAE  solution  are  required. 
For  s(0,0),  we  use  0  to  insure  outer  loop  convergence  (a  stability  matrix).  For  s(0,k),  we 
use  the  obvious  choice  of  the  prior  s(0,k-l)  estimate.  The  acceleration  parameter  ^  in  (3) 
is  chosen  to  be  u  *=  n/Xmax  s  3 /  |  |H(k)  |  |.  This  insures  inner  loop  convergence  [2,5].  Stop¬ 
ping  the  inner  loop  ierations  (index  r)  for  each  LAE  solution  and  stopping  the  number  of 
outer  loop  iterations  (index  k)  is  a  major  decision. 

In  reference  [5],  we  derived  bounds  for  the  inner  loop  error,  the  outer  loop  error  and 
their  coupling.  From  this  analysis,  we  derived  the  selection  of  a  fixed  number  of  inner 
loop  iterations  R  to  solve  each  LAE  given  by 

R  ■  nC  ■  C log  a  *  1.5C  to  3.0C.  (4) 

where  j  |x  (0) -x* (1) | |  a  and  [1  -  1/C]R  »  exp(-n)  <  1/a  is  chosen.  This  follows  from  our 
analysis  of  the  error  In  an  iterative  solution  (due  to  a  fixed  number  of  iterations  R) , 
which  showed  that  the  norm  of  such  an  error  is 

| |s(r,k)  -  s* | |  -  | |I  -  uH<k) | |r  -  (1  -  1/C (k) ]r,  (5) 

where  C  is  the  condition  number  of  H.  Since  r  is  expected  to  increase  with  C,  we  set  r  =  nC 
and  thus  select  n  such  that  the  error  between  the  computed  solution  s  and  the  exact  solution 
s*  in  (5)  is  as  small  as  is  required.  For  the  fixed  number  of  outer- loop  iterations  K,  we 
use  K  ■  5  or  6,  which  can  be  theoretically  derived  (and  appropriately  modified)  for  other 
applications  with  matrices  with  specific  features.  These  iterative  operational  parameter 
selections  are  summarized  in  Table  1. 

4.  ERROR-FREE  SIMULATION  RESULTS 

The  performance  measures  we  adopted  to  assess  performance  of  the  algorithm  in  Section  2 
implemented  using  the  operational  parameters  in  Table  1  are  the  maximum  percent  error  in  any 
element  of  the  matrix  K  (i.e.  £KmaxU  and  the  maximum  error  in  the  location  of  the  closed- 
loop  poles  of  the  system  (iXmax») .  We  expect  AK  >>  and  note  that  is  the  more 


appropriate  error  measure  for  this  specific  application  and  that  similar  error  measures 
should  be  used  to  evaluate  the  performance  of  other  specific  case  studies.  In 

Figures  2  and  3,  we  show  the  variation  of  these  two  error  measures  with  the  number  of  outer 
loop  iterations  k  for  a  fixed  number  of  inner  loop  iterations  for  two  case  studies.  These 
case  studies  are  the  fifth  (Figure  2)  and  third  (Figure  3)  order  models  of  an  F100  engine. 
As  seen  from  the  data  for  these  two  case  studies,  the  use  of  a  fixed  number  of  iterations 
results  in  a  monotonic  decrease  in  the  solution  error  with  the  AK  error  being  approximately 
ten  tines  that  of  the  Ai  error.  From  these  results,  we  conclude  that  the  use  of  a  fixed 
number  of  iterations  can  yield  adequate  results  when  the  number  of  iterations  is  properly 
chosen.  Our  parameter  selection  guidelines  in  Table  1  have  thus  all  been  verified  and  dis¬ 
cussed. 


TABLE  1 

Operational  Parameter  Selection  Guidelines  [5] 


SYMfaOL 

PARAMETER 

PREFERRED  CHOICE 

S(0,0) 

Initial  Initialization 

s(0,0)  =0 

s (0,k) 

k-th  Kleinman  Loop  Initialization 

s (0,k)  *  s(0,k-l) 

R 

Number  of  Inner  Loop  Iterations 

R  =  1.5C  to  3.0C 

K 

Number  of  Outer  Loop  Iterations 

K  -  5  -  6 

w(k) 

Acceleration  Parameter 

w(k)  =  3/ | |H(k) | | 

FIGURE  2 

Variation  of  the  error  measures  AKmax(u) 
and  AAniax(t)  with  the  number  of  outer- 
loop  iterations  K  for  different  inner- 
loop  iteration  stopping  criteria  for  the 
fifth-order  HPG3  F100  model 


FIGURE  3 

Variation  of  the  error  measures  AKmax<%) 
and  Aimax(%)  with  the  number  of  outer- 
loop  iterations  K  for  different  inner- 
loop  iteration  stopping  criteria  for  the 
third-order  HPG3  FI  00  model 


5.  ERROR  SOURCE  MODEL 


In  earlier  publications  [7,8],  we  detailed  the  first  system  and  component  error  source 
model  for  an  OLAP  and  the  general  issue  of  errors  in  such  an  architecture.  In  this  section, 
we  review  this  OLAP  error  source  model.  In  this  model,  we  distinguish  input,  AO  cell  and 
detector  plane  errors  separately.  Spatial  errors  include:  input  and  detector  response 
variations  and  errors  in  the  interconnections  between  the  input  modulators  and  the  AO  cell, 
and  detector  dark  current.  The  spatial  variations  are  fixed  (time-independent)  and  are 


correctable  to  small  residual  levels  as  required  (by  adjusting  the  gain  of  the  input  point  modu¬ 
lators,  the  detector  amplifiers,  and  the  input  matrix  and  vector  data).  Detector  noise  is 
the  only  time-varying  error  source  considered.  Acoustic  attenuation  produces  a  deterministic 
exponential  variation  of  the  data  in  the  AO  cell.  This  effect  is  dispersive,  but  its  fre¬ 
quency  dependence  is  not  included  in  our  present  model.  Acoustic  attenuation  can  be  corrected 
at  one  freouency  and  is  thus  an  ingjit  spatial  error.  The  product  of  an  input  matrix  A  and  vector 
b  thus  yields  a  final  output  d  given  by 
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As  seen,  the  different  types  of  system  and  component  variations  are  described  by  error  ma¬ 
trices  that  multiply  the  input  data  vector  or  input  matrix  data.  Thus,  the  system  errors 
are  described  by  the  corresponding  variations  in  the  data  matrix  and  vector.  The  detector 
dark  current  and  noise  appear  additively  in  the  output  vector  as  shown  in  Eq.  (6)  . 


6.  DIRECT  AND  INDIRECT  SOLUTIONS  OF  LAEs 


The  solution  of  a  system  of  LAEs,  A  x  =  b  is  the  fundamental  operation  required  in  most 
linear  algebra  processors  and  signal  processing  applications.  Thus,  we  concentrate  on  this 
function.  The  two  major  types  of  LAE  solutions  are  direct  or  matrix  decomposition  solution 
and  an  iterative  or  indirect  solution. 

The  preferable  iterative  algorithm  is  [2,9] 

x(r+l>  =  x(r)  +  w[b  -  Ax  (r)J,  (7) 

where  _  is  an  acceleration  parameter  chosen  to  insure  convergence.  The  iterations  (described 
by  the  iterative  index  r)  continue  until  x(r)  «  x(r+l).  Then,  (7)  reduced  to  A  x  «  b  and 
the  system's  output  x  is  the  desired  solution.  To  implement  (7)  on  the  system  of  Figure  1, 
the  matrix  data  is  fed  to  the  AO  cell  one  column  at  a  time  in  parallel  with  the  rows  of  the 
matrix  frequency-multiplexed,  i.e.  with  the  matrix  elements  amn  encoded  in  time  and  fre¬ 
quency  as  a(f,t)  and  with  the  vector  data  x  spatially-multiplexed  as  x(x>  and  fed  in  parallel 
to  the  input  point  modulators.  The  matrix-vector  product  A  x  is  formed,  operated  upon  in 
analog  or  digital  post-processing  electronics  to  produce  tKe  right-hand  side  of  (7)  and  hence 
the  new  x  iterate  input  to  the  point  modulators.  Thus,  the  detector  output  is  fed  back  to 
the  input  point  modulators.  The  length  of  the  AO  cell  NTjj  is  chosen  to  be  just  as  suffi¬ 
cient  to  accommodate  the  matrix  data.  Each  Tg,  as  one  column  of  the  matrix  leaves  the  AO 
cell,  it  is  reintroduced  into  the  bottom  of  the  cell.  This  recycling  of  the  matrix  data  is 
more  efficient  for  system  fabrication  and  reduces  the  effects  of  acoustic  attenuation. 

In  direct  solutions,  the  matrix  A  and  the  vector  b  are  multiplied  by  a  decomposition 
matrix  Pj  to  generate  new  Aj  and  b, .  Each  such  matrix-matrix  and  matrix-vector  multiplica¬ 
tion  generates  one  row  of  the  final  A'  matrix  and  one  element  of  the  final  b'  vector. 

After  each  matrix-matrix  multiplication ,  the  order  of  the  matrix  and  vector  are  reduced  by 
one  and  the  new  reduced  order  Aj  and  bi  are  multiplied  by  a  new  Pj .  This  procedure  is  re¬ 
peated  N-l  times  (for  an  N  x  N  matrix)  and  yields  a  new  upper-triangular  matrix  U  and  a  new 
vector  b' .  This  simplified  upper-triangular  system  of  equations  U  x  *  b'  is  tKen  easily 
solved  by  back-substitution.  The  matrix-decomposition  can  be  realized  either  as  an  LU  de¬ 
composition  (this  is  the  technique  we  use  when  the  matrix  is  positive-definite  or  diagonally- 
dominant,  as  is  the  case  here,  since  pivoting  is  then  not  required)  or  as  a  QR  orthogonal 
decomposition  (this  technique  is  more  general  and  stable,  but  is  more  difficult  to  realize) . 
The  detailed  implementation  of  LU  [2,10]  and  QR  [2,11]  decomposition  and  back-substitution 
[2,12]  have  been  described  elsewhere.  To  implement  the  Gaussian-elimination  algorithm  (LU) 
used  in  the  present  application  on  the  system  of  Figure  1,  we  feed  one  row  of  the  matrix  A 
to  the  AO  cell  in  parallel  (with  the  columns  of  A  frequency-multiplexed,  i.e.  with  the 
elements  amn  of  A  frequency  and  time  encoded  as  a(t,f))  and  with  one  row  of  the  decomposition 
matrix  Pj  fed  to  the  input  point  modulators  in  parallel  (with  the  elements  pmn  of  P  time 
and  space  encoded  as  p(t,x)).  To  facilitate  data  flow  and  for  speed,  we  simultaneously 
operate  on  A  and  b  by  using  an  augmented  matrix.  One  row  of  the  augmented  matrix  A'  is 
produced  in  parallel  as  a'(t,x)  on  the  output  detector  during  each  of  the  N  cycles.  The  new 
P-j  matrix  is  easily  calculated  from  the  elements  of  the  j-th  column  of  the  augmented  matrix 
in  dedicated  electronics. 

7.  SYSTEM  ERROR  EFFECTS  ON  THE  SOLUTION  OF  LAEs 


The  direct  solution  requires  an  AO  cell  of  twice  the  length  of  the  matrix,  but  achieves 
the  decomposition  in  a  fixed  number  of  steps.  However,  as  noted  in  Section  3,  iterative 


algorithms  can  be  operated  with  a  fixed  number  of  iterations  to  achieve  a  given  desired 
accuracy  and  iterative  algorithms  are  essential  [2]  for  eigen-systems  solutions  and  the 
solution  of  nonlinear  matrix  equations  such  as  the  ARE  [5]  and  in  Kalman  filtering  113].  In 
our  new  results  (Sections  7  and  8)  ,we  canpare  (6]  the  performance  of  direct  and  iterative  algo¬ 
rithms  in  the  solution  of  the  LAEs  that  arise  in  a  specific  ARE  solution  for  the  F100  engine. 
The  two  cases  considered  are  third  and  fifth-order  FI 00  models.  These  give  rise  to  9  x  9 
and  25  x  25  matrices.  Bipolar  data  is  handled  by  space-multiplexing  (3)  and  this  doubles 
the  size  of  the  matrices  and  vectors  required.  For  the  third-order  problem,  C  «  2.48,  the 
dynamic  range  is  47.7  and  from  (5),  j  «  10  iterations  are  required  to  solve  each  LAE.  For 
the  fifth-order  problem,  C  *  56.9,  the  matrix  dynamic  range  is  1117  and  from  (5),  j  *=  100 
iterations  are  required  to  solve  each  LAE.  We  consider  three  solutions!  an  iterative  algo¬ 
rithm,  direct  LU  Gaussian-elimination  with  the  back-substitution  performed  optically  and 
direct  Gaussian-elimination  with  the  back-substitution  performed  digitally  with  high  accuracy. 
We  consider  two  problems:  the  solution  of  A5X5  «  bs  for  the  fifth  and  last  outer  loop  in 
(2)  and  (3)  for  the  solution  of  the  ARE  in  7lT  witK  A5  and  bs  digitally  calculated  exactly, 
and  the  solution  of  all  five  LAEs  for  all  outer  loop  iterations. 


TABLE  2 

Performance  of  Three  Algorithms  for  Two  Data  Sets  in  the  Solution  of  One  System  of  LAEs 


ALGORITHM 


F100  RESP.  VARIATIONS  ACOUSTIC 
DATA  Point  j  ATTEN . 

SET  Mod s ( % )  Dets(t)  (dB  /  cm) 


(I)  Iterative 


(II)  LU  and 

Optical  Back- 
Substitution 


(III)  LU  and 

Digital  Back- 
Substitution 


TABLE  3 

Performance  of  Three  Algorithms  for  Two  Data  Sets  in  the  Solution  of  the  Nonlinear  ARE 


ALGORITHM 


(I)  Iterative 


RESP.  VARIATIONS  ACOUSTIC 
Point  ATTEN. 

ModsU)  Pets(%)  (dB  /  cm) 


1  1  I  0.001 


DET  RMS 
NOISE (%) 

II  Ax||  <%) 

A>  <%> 

max 

0.6 

2.98 

0.77 

0.06 

5.24 

1.62 

0.6 

4.56 

O 

"J 

NJ 

6xl0-4 

11.34 

1.44 

0.6 

4.12 

0.5 

6x1 O-^ 

10.17 

1.17 

(II)  LU  and 

Optical  Back- 
Substitution 


(III)  LU  and 

Digital  Back- 
Substitution 


In  Table  2,  we  show  the  results  for  the  solution  of  the  single  fifth  set  of  LAEs.  Our 
results  for  the  full  set  of  five  LAEs,  i.e.  the  full  ARE  solutions  are  included  in  Table  3. 
Data  sets  3  and  5  refer  to  the  third  and  fifth-order  F100  matrix  problems  respectively.  The 
performance  measures  used  in  evaluating  each  approach  are  the  average  norm  ||Ax||  of  the 
error  in  the  calculated  vector  x  and  the  maximum  error  £Xmax  in  the  location  of  the  closed- 
loop  poles  of  the  final  system.”  The  spatial,  detector  noise,  and  acoustic  attenuation 
errors  noted  earlier  were  selected  to  produce  approximately  equal  output  errors  for  each 
error  source  treated  separately. 

In  Tests  1  and  2,  we  see  that  our  theoretical  operational  parameters  (Table  1)  are  also 
valid  when  noise  and  system  errors  are  present.  Comparing  the  results  for  Algorithm  I  and 
II,  we  see  that  acoustic  attenuation  is  the  dominant  error  source  for  an  iterative  algorithm 
and  detector  noise  dominates  the  performance  of  a  direct  algorithm.  This  is  expected 


because  of  the  cyclic  data  flow  of  the  matrix  in  the  AO  cell  during  the  iterative  algorithm. 
This  alters  C  for  the  matrix.  In  the  direct  algorithm,  detector  noise  on  one  cycle  is  fed 
back  to  both  the  inputs  and  the  AO  cell  and  thus  changes  the  noise  distribution  and  its 
effects  accumulate.  Also,  detector  noise  affects  the  small  vector  elements  and  this  effect 
also  compounds  on  successive  cycles.  From  the  results  of  Algorithms  II  and  III,  we  see  that 
optical  back-substitution  yields  comparable  performance  to  digital  back-substitution.  This 
is  expected,  since  the  operations  required  in  back-substitution  are  only  vector  inner  prod¬ 
ucts  and  only  N-l  of  these  are  required.  This  is  a  substantially  lower  computationally  in¬ 
tensive  set  of  operations  than  those  required  in  the  matrix  decomposition.  Thus,  the 
accuracy  of  the  matrix  decomposition  determines  the  final  accuracy  in  our  results.  Comparing 
the  results  for  data  sets  3  and  5  and  the  corresponding  data  in  Tables  2  and  3,  we  see  that 
the  larger  matrix  size  and  the  increased  number  of  steps  required  in  the  ARE  versus  the  LAE 
solution  causes  the  required  accuracy  to  increase  for  direct  algorithms  more  than  for  iter¬ 
ative  algorithms  (e.g.  a  lower  acoustic  attenuation  constant  a  is  noted  to  be  required  for 
the  iterative  ARE  solution  than  for  a  direct  LAE  solution) .  We  have  derived  a  theoretical 
expression  [6] 

a  <  (1/2. 3LC)  (7) 

for  the  amount  of  acoustic  attenuation  a  in  dB/cm  allowed  for  convergence  of  an  iterative 
algorithm,  where  L  is  the  length  of  the  AO  cell  in  cm.  From  the  last  two  columns  in  both 
tables,  we  see  that  fi>.m ax  errors  are  significantly  less  than  fix  errors  as  expected.  The 
results  in  Tables  2  and  3  are  in  agreement  with  the  theoretical  guidelines  in  (7) .  From 
Test  1  and  all  other  tests,  we  find  that  spatial  errors  are  additive  and  that  for  small  errors 
the  percent  performance  scaled  with  the  magnitude  of  the  error.  In  Tables  2  and  3  and  in 
(7) ,  we  assume  that  each  Tg  of  the  AO  cell  corresponded  to  1mm  and  we  assumed  new  input  data 
to  the  point  modulators  in  the  AO  cell  to  be  introduced  every  Tg.  To  achieve  more  practical 
o  levels,  closer  spacing  of  data  packets  in  the  cell  is  necessary.  This  can  easily  be 
obtained  by  scaling  the  values  given  in  Tables  2  and  3.  Operation  of  the  input  point  modula¬ 
tors  at  a  higher  rate  than  the  AO  cell  data  [2]  can  also  improve  the  a  and  detector  noise 
values  found  in  Tables  2  and  3.  These  initial  test  results  are  intended  to  provide  guide¬ 
lines  for  the  efficient  use  of  various  algorithms,  efficient  solutions  to  linear  and  non¬ 
linear  matrix  equations,  and  quantitative  data  on  performance  expected.  Our  theory,  guide¬ 
lines,  and  modeling  are  also  appropriate  for  digital-optical  linear  algebra  architectures. 

8,  REAL-TIME  LABORATORY  EXPERIMENTS 

In  Figure  2,  we  show  the  nine  outputs  from  a  laboratory  system  to  iteratively  solve  the 
fifth  set  of  LAEs  for  the  third-order  F100  model  (Test  1,  Table  2).  The  outputs  are  shown 
after  80,  400  and  640  iterations.  The  laboratory  system  used  a  fixed  2-D  photographic  mask 
for  the  matrix  in  place  of  the  AO  cell  and  2-D  space-multiplexing  in  place  of  frequency-mul¬ 
tiplexing.  To  accomodate  bipolar  data,  the  matrix  and  vector  were  biased  positive.  This 
increased  C  to  120.  The  laboratory  system  was  operated  at  a  10MHz  data  rate  per  channel. 

To  facilitate  easy  monitoring  of  the  system,  we  used  u>  -  -0.125.  The  number  of  iterations 
j  «  nC  required  for  0.6%  accuracy  was  calculated  from  (3)  to  be  613  iterations.  Our  experi¬ 
mental  value  of  640  iterations  at  which  convergence  occurred  is  thus  in  excellent  agreement 
with  theory.  In  the  laboratory  system,  the  maskerrors  were  ±7.2%  and  these  dominated  other 
spatial  system  errors.  The  detector  noise  was  measured  as  0.4%.  With  these  errors  included 
in  our  simulator , ^the  solution  vector  x  was  calculated,  compared  to  the  ideal  theoretical  x* 
value  and  to  the  x  vector  calculated  on  the  laboratory  system.  The  locations  of  the  closed- 
loop  poles  of  the  system  in  each  case  were  calculated  and  compared.  The  results  in  Table  4 
show  excellent  agreement  (0.5%  accuracy  or  better)  in  the  location  of  the  poles  and  with  the 
nature  of  the  poles  preserved  (e.g.  complex-conjugate  pole  pairs). 


TABLE  4 

Comparison  of  the  Closed-Loop  Poles  Computed  Theoretically  and  Using 
the  Optical  Laboratory  System 


THEORETICAL  POLE 

OPTICAL  LABORATORY 

%  ERROR 

LOCATIONS 

COMPUTED  POLES 

-20.45  +  j6 . 26 

-20.74  +  j5 . 88 

1 

-20.45  -  j6 . 26 

-20.74  -  j5 .88 

m 

-4.53 

-4.53 

(a)  80  ITERATIONS  (b)  400  ITERATIONS  (c)  640  ITERATIONS 

FIGURE  2 

The  nine  photo-detectors  outputs  from  a  fixed  mask  OLAP  at  selected  cycles  in  the  iterative 
solution  of  the  system  of  LAEs  A5X5  =  £5  that  arise  in  the  final  loop  of 
the  solution  of  the  nonlinear  ARE 


9.  SUMMARY  AND  CONCLUSION 

We  have  detailed  a  two-loop  solution  to  the  nonlinear  ARE.  In  the  iterative  solution,  a 
fixed  number  of  iterations  can  be  employed  to  achieve  a  given  performance  accuracy.  A 
direct  solution  of  each  LAE  can  also  be  employed,  however  the  iterative  solution  is  faster 
(100Tb  vs.  97STb)  .  Selection  of  the  operational  parameters  for  the  two-loop  algorithm  were 
theoretically  derived,  verified  by  noise-free  simulations  and  shown  to  be  appropriate  when 
system  noise  and  errors  were  present.  The  implementation  of  direct  and  iterative  solutions 
of  LAEs  on  a  frequency-multiplexed  OLAP  was  detailed.  A  theoretical  analysis  of  both  algo- 
-ithms  showed  that  acoustic  attenuation  was  the  dominant  error  source  in  iterative  algorithms 
and  detector  noise  dominated  direct  algorithms.  Our  simulations  verified  these  theoretical 
predictions  and  quantified  the  performance  obtained  with  each.  Our  theoretical  values  for 
the  amount  of  acoustic  attenuation  allowed  to  permit  convergence  of  an  iterative  algorithm 
was  verified  by  simulations.  We  confirmed  and  quantified  by  simulations  that  optical  back- 
substitution  yields  comparable  performance  to  its  digital  realization.  Experimental  verifi¬ 
cation  on  a  laboratory  system  was  obtained.  The  guidelines,  and  theory  provided  are  appro¬ 
priate  for  various  other  systolic  processors  (optical  and  digital)  and  for  high-accuracy 
digital-optical  linear  algebra  processors.  Our  nonlinear  matrix  solution  using  a  fixed  num¬ 
ber  of  iterations  is  appropriate  for  realization  on  any  linear  algebra  processor. 
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ABSTRACT 


The  number  of  multiplications  per  second  and  fabrication  issues  associated  with  several 
different  acousto-optic  systolic  processors  are  discussed  and  the  flexibility  in  the  opera¬ 
tions  achievable  by  format  control  are  briefly  reviewed.  Emphasis  is  given  to  the  effects 
of  divergence  of  the  optical  input  beam.  Various  input  sources  and  interconnection  schemes 
are  considered.  These  include:  fiber  and  GRIN  optics,  multi-channel  acousto-optic  cells 
and  individually  collimated  laser  diodes.  Quantitative  theoretical  and  experimental  data 
are  provided.  A  new  architecture  using  spatial-multiplexing  of  the  input  sources  and  fre¬ 
quency-multiplexing  of  the  acousto-optic  cell  data  is  described  and  used  for  handling  bipolar 
and  complex-valued  matrix  and  vector  elements. 

1 ■  INTRODUCTION 

Optical  matrix-vector  processors  (1,2]  represent  a  most  general-purpose  class  of  optical 
system.  Optical  systolic  array  processors,  especially  those  using  acousto-optic  (AO)  cells 
[3-5,13]  represent  very  practical  systems  that  can  be  fabricated  with  present  technology. 

Many  interested  people  feel  that  the  optics  community  should  fabricate  an  optical  systolic 
array  processor  rather  than  continue  paper  studies  of  such  systems.  In  this  paper,  we  ad¬ 
dress  several  fabrication  and  architectural  issues  associated  with  AO  systolic  array  proces¬ 
sors.  In  Section  2,  we  provide  a  quantitative  assessment  of  the  performance  (in  terms  of 
mults/secs)  possible  on  two  different  basic  AO  systolic  processors.  A  new  architecture  using 
a  multi-channel  AO  cell  is  described  for  use  in  cases  when  a  higher  computational  rate  is 
required.  Other  more  advanced  multi-channel  AO  systolic  processors  have  been  advanced  else¬ 
where  [13]  and  their  use  with  digitally-encoded  data  for  higher  accuracy  has  also  been  de¬ 
scribed.  In  Section  3,  we  briefly  review  some  of  the  different  operations  required  in  line¬ 
ar  algebra  and  how  all  of  the  basic  operations  needed  are  possible  (via  format  control)  on 
the  same  generic  optical  systolic  array  processor. 

Our  initial  remarks  and  comparisons  of  different  architectures  (Sections  2  and  3)  are  also 
flavored  with  practical  fabrication  considerations.  In  Section  4,  we  specifically  address 
and  quantify  the  effects  of  optical  beam  divergence  on  the  performance  of  various  input 
to  AO  cell  interconnection  techniques  (Section  5)  suitable  for  a  wide  variety  of  AO  systolic 
array  processors.  In  Section  6,  we  address  several  architectural  issues  associated  with 
handling  bipolar  and  complex-valued  matrix  and  vector  data.  A  new  optical  systolic  array 
architecture  is  advanced  for  such  practical  applications.  Our  summary  and  conclusions  then 
follow  in  Section  7. 

The  computational  rate  of  an  optical  systolic  array  processor  is  the  most  discussed  per¬ 
formance  parameter  of  such  systems.  However,  the  flow  and  pipelining  of  operations  and  data 
in  these  systems  is  of  equal  importance  [3],  as  is  the  ease  of  fabrication  and  the  flexibil¬ 
ity  of  a  given  architecture  (Section  3) .  Another  vital  factor  is  that  the  operations  possi¬ 
ble  on  a  given  architecture  must  be  properly  arranged  to  solve  a  given  problem.  This  gen¬ 
erally  involves  much  more  than  simply  a  matrix-vector  multiplication.  Examples  of  the  de¬ 
tailed  linear  algebra  operations  required  in  various  applications  are  available  in  the  lit¬ 
erature.  The  examples  thusfar  published  include  adaptive  phased  array  radar  [6],  Kalman 
filtering  [3,7],  and  optimal  control  [8].  The  need  for  parallel  algorithms  suitable  for 
optical  architectures  [9,10]  is  also  of  vital  concern.  The  accuracy  of  optical  linear  alge¬ 
bra  processors  is  yet  a  final  issue  requiring  attention  in  many  applications. 

In  this  present  paper,  we  restrict  attention  to  AO-based  systolic  array  processors,  since 
they  are  the  most  easily  fabricated  architectures.  We  further  consider  only  vector  inner 
product  (VIP)  and  matrix-vector  (M-V)  architectures,  since  such  systems  have  1-D  output  de¬ 
tector  arrays.  The  high  data  rates  from  optical  matrix  processors  are  such  that  optical 
linear  algebra  systems  requiring  2-D  detector  output  arrays  (such  as  vector  outer  product 
systems  and  certain  matrix-matrix  processors)  pose  severe  output  detector  fabrication  re¬ 
quirements.  Specifically,  a  2-D  parallel  readout  detector  array  with  high  data  readout  rates 
is  required.  We  also  consider  in  this  present  paper  only  optical  architectures  operating  on 
analog  data.  These  systems  represent  those  architectures  with  the  highest  throughput.  By 
the  use  of  multi-channel  AO  cells  and  various  architectural  changes,  these  basic  systems  we 
consider  can  be  extended  to  operate  on  digital  and  other  encoded  data.  Other  authors  have 


/ 
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addressed  various  approaches  to  achieving  high-accuracy  optical  systolic  array  processors 
using  various  architectures  and  data  encoding  schemes.  In  this  present  paper,  we  will  also 
consider  only  optical  systems  capable  of  operating  on  general  matrices  with  no  special  struc¬ 
ture.  This  class  of  system  represents  the  most  general-purpose  architecture.  Different 
architectures  13,4)  are  suitable  for  matrix  problems  with  special  matrix  structure. 

Our  results  are  sufficiently  general  to  be  applicable  to  many  AO  systolic  processors.  The 
new  architectures  we  describe  can  be  extended  (by  the  use  of  multi-channel  cells  and  addi¬ 
tional  linear  modulator  arrays)  to  handle  encoded  data  (for  applications  where  higher  accu¬ 
racy  is  required).  The  computational  rates  possible  from  all  optical  systolic  processors  is 
so  large  that  one  dimension  of  the  multiplexed  systems  shown  can  easily  be  used  for  data 
encoding.  In  such  cases,  the  performance  of  the  system  is  only  reduced  by  a  factor  of  16-32 
and  still  yields  a  quite  significant  number  of  mults/sec  with  a  significantly  more  accurate 
system  with  fewer  dynamic  range  constraints. 

2.  COMPUTATIONAL  RATES 


As  noted  in  Section  1,  the  computational  rate  (mults/sec)  possible  is  a  favorite  criteria 
(but  not  the  panacea)  for  comparing  optical  systolic  array  processors.  Following  the  termi¬ 
nology  and  motivation  in  (11),  we  now  briefly  compare  the  performance  obtainable  from  the 
two  generic  clacses  of  optical  systolic  array  processors.  Rhodes  til)  distinguished  between 
two  types  of  AO  systolic  processors  by  the  manner  in  which  the  AO  cell  was  used.  He  refers 
to  these  AO  operating  modes  as  a  modulator  (Figure  1)  and  a  deflector /modulator  (Figure  2). 
Both  approaches  are  self-explanatory. 


Pi  ?2  ?3 
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MODS  CELL  DET 


(A)  Integrating  Detector  System. 


(B)  Single  Detector  Architecture. 


FIGURE  1:  Two  Basic  Acousto-Optic  Modulator  Vector  Inner  Produc*  Processors. 


The  architectures  in  Figure  1  perform  the  basic  operation  of  a  VIP  with  one  vector  fed  to 
the  AO  cell  and  the  other  vector  fed  to  the  input  point  modulators.  The  output  from  the  sys¬ 
tem  is  the  VIP.  In  the  system  of  Figure  IB,  the  full  VIP  appears  on  one  detector.  In  the 
system  of  Figure  l.\,  the  product  of  each  corresponding  element  of  each  vector  is  formed  on 
separate  detectors.  The  output  detectors  in  Figure  1A  can  accumulate  data  or  their  contents 
can  be  shifted  and  added.  These  operations  can  be  used  in  performing  matrix-vector  multi¬ 
plications  on  a  VIP  processor.  In  the  system  of  Figure  2,  data  is  fed  to  the  AO  cell  time 
and  frequency-multiplexed,  i.e.  the  cell  contains  2-D  or  matrix  data  and  the  basic  operation 
of  the  system  is  a  matrix-vector  multiplication. 


MODS  CELL 


FIGURE  2:  Frequency-Multiplexed  Modulator/Def lector  Acousto-Optic  Matrix-Vector  Processor. 


We  denote  the  transit  time  in  the  AO  cell  between  two  adjacent  spatially-illuminated  re¬ 
gions  by  the  bit  time  TB  and  the  full  aperture  time  of  the  cell  by  T^.  In  all  cases,  effi¬ 
cient  use  of  the  system  requires  a  cell  with  TA  =  2NTB  and  2N  -  1  point  modulators  (where  N 
is  the  order  of  the  vector  or  the  matrix).  To  see  this,  recall  that  NTg  of  time  is  required 
to  load  data  into  the  cell  and  that  NTg  of  the  cell's  aperture  time  is  required  for  this 
data.  After  1TB  of  time,  the  entire  contents  of  the  cell  are  no  longer  useable  (since  one 
element  of  the  vector  has  now  left  the  cell)  .  We  could  recycle  this  element  into  the  bottom 
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of  the  cell,  but  this  requires  additional  complexity,  memory  and  complicates  data  flow  and 
feedback  in  general  applications.  Thus,  we  consider  operation  of  all  systems  by  initially 
(at  lTg)  pulsing  on  the  bottom  N  point  modulators  with  one  input  vector,  forming  one  VIP. 

Then  (at  time  2TB)  pulsing  on  the  point  modulators  2  to  N+l  with  new  vector  data,  etc.  In 
this  way,  we  multiply  the  vector  in  the  cell  by  N  different  vectors  before  data  reaches  the 
end  of  the  cell.  Every  TB,  we  input  new  data  to  the  cell  and  thus  maintain  full  throughput 
in  the  system.  In  all  systems,  we  thus  assume  TA  =  2NTB.  The  architectures  of  Figures  1A 
and  IB  thus  perform  one  VIP  on  N  element  vectors  every  TB  or  N  VIPs  in  T  =  NTB  (where  Tft=2T). 

The  operation  of  the  system  of  Figure  2  can  most  easily  be  described  by  viewing  the  con¬ 
tents  of  the  AO  cell  as  Nj  vectors  (each  of  length  Nj)  with  each  vector  on  a  separate  fre¬ 
quency  carrier.  The  data  leaving  the  AO  cell  in  the  system  of  Figure  2  thus  consists  of  N} 
VIPs  on  N}  element  vectors.  Since  the  data  leaving  the  AO  cell  is  Fourier  transformed  onto 
the  output  plane  in  Figure  2,  proper  frequency-multiplexing  and  arranging  of  data  can  allow 
each  of  these  separate  Nj  VIPs  to  be  produced  in  parallel  on  Nj  separate  output  detectors. 
When  the  input  data  to  this  system  is  properly  multiplexed,  a  full  matrix-vector  multiplica¬ 
tion  is  performed  each  TB  (this  is  compared  to  one  VIP  per  TB  for  the  architectures  in  Figure 
1).  We  will  denote  the  time  bandwidth  product  (TBWP)  of  the  AO  cell  as  TBWP  =  2N.  Thus,  the 
systems  in  Figure  1  can  operate  on  N  element  vectors,  whereas  the  system  of  Figure  2  can 
operate  on  an  Nj  x  Nj  element  matrix  (where  =  N) .  Furthermore,  Tb  for  the  systems  of 
Figure  1  satisfies  T  =  NTB  or  N  =  T^/2Tb,  whereas  the  system  of  Figure  2  requires  T  =  TbNj 


or  a  larger  TB  jince  Nj  <  N,  specifically  =  N. 

To  quantitatively  compare  the  performance  and  fabrication  issues  for  these  architectures, 
we  assume  an  AO  cell  with  TA  =  40_sec  (T  =  20_sec)  and  TBWP  =  2000  (i.e.,  N  =  1000).  The 
systems  in  Figure  1  can  thus  perform 

N  mults/T-  *  1  VIP/T-,  or  N2  mults/T  =  1  M-V  mult/T  (1) 

d  a 

i.e.,  one  VIP  every  TB  or  one  matrix-vector  (M-V)  multiplication  every  T.  This  results  in 

1 00  02/20u sec  =  5  x  1010  mults/sec  *=  50  GOPS.  (2) 

The  system  of  Figure  2  performs 

N15  VIPs/TB  =  1  M-V  mult/Tg  =  N^N*5  mults/Tg  =  N  mults/Tg  =  NN*5  mults/T,  (3) 

where  T  =  N*3TB  was  used.  This  yields  a  computation  rate  for  the  system  of  Figure  2  of 

1000 (32)/20usec  *=  1.6  x  10^  mults/sec.  (4) 

It  is  possible  to  pulse  on  each  point  modulator,  in  Figure  2,  =  32  times  per  Tg.  In  this 

case,  (4)  becomes 

5.1  x  10^0  mults/sec,  (5) 


and  thus  both  architectures  can  achieve  the  same  performance  for  the  same  I/O  data  rate  (as 
expected) . 

However,  let  us  consider  the  hardware  and  fabrication  requirements  of  these  architectures 
to  achieve  the  computation  rate  in  (5) .  The  architectures  of  Figure  1  require  2000  point 
modulators  all  packed  very  densely  and  all  addressed  in  parallel  at 

Tg  *=  20usec/1000  «=  20nsec,  or  at  50MHz.  (6) 

This  is  a  quite  high  data  rate  (and  precludes  A/D  and  D/A  conversion,  at  a  large  number  of 
bits) .  The  very  large  number  of  sources  required  represents  a  considerable  fabrication 
achievement.  Conversely,  to  achieve  the  same  performance,  the  system  of  Figure  2  requires 
only  64  point  modulators,  32  detectors  and  a  much  lower  bit  time 

Tg  =  (Ta/2)/N>5  =  0 . 625usec ,  or  1.6MHz.  (7) 

As  seen,  the  bit  time  is  significantly  reduced,  as  is  the  data  rate  1.6MHz  versus  50MHz  at 
which  data  must  be  fed  in  parallel  to  each  channel.  Thus,  the  ease  of  fabrication  (64  vs. 
2000  point  modulators)  for  the  frequency-multiplexed  modulator /def lector  matrix-vector  pro¬ 
cessor  of  Figure  2  is  quite  attractive  compared  to  the  VIP  architectures  using  AO  modulators 
in  Figure  1 . 


Should  a  given  application  require  a  higher  computational  rate  above  50  GOPS,  a  multi¬ 
channel  AO  cell  can  be  used  (Figure  3) .  Practical  considerations  dictate  that  the  number  of 
multiple  channels  (rows  of  the  matrix  A)  will  be  less  than  the  TBWP  of  each  channel  (the 
number  of  columns  of  the  matrix  A) .  Frequency-multiplexing  and  matrix  partitioning  are  thus 
quite  essential  to  redistribute  the  TBWP  of  the  multi-channel  AO  cell.  If  we  have  M2  AO 
cell  channels  (each  2Ni1^  =  long) ,  Mi  frequencies,  2Ni  point  modulators,  and  Mi  output  arrays 
of  M2  detectors  each,  the  architecture  of  Figure  3  realizesMi  matrix-vector  multiplications  per  Tb 
(where  the  matrix  is  M2  x  Nj ,  i.e.  one  matrix  per  frequency,  and  the  vector  is  of  length  N y ) 
i.e. 

Mx  M-V  mults/Tg  =  M1M2N1  mults/Tg.  (8) 

Assuming  reasonable  parameters:  2NiMi=TBWP  *  2000,  Mi  '  Nj  s  32  and  M2  *  100,  we  find 

10^  mults/T_ *  10^/0 . 625usec =  1 . 6 x 1 0^  mults/sec  (9) 

D 

or  with  32  pulses  per  Tjj,  we  obtain 

S.lxlO*2  mults/sec.  (10) 

This  is  equivalent  to  3200  VIPs  every  0.6ysec.  The  computation  rate  in  (9)  or  (10)  can  be 
achieved  with  only  64  point  modulators  and  a  data  rate  of  1.6MHz.  With  vertically-oriented 
detector  arrays,  the  system  of  Figure  3  can  perform  M2  matrix-vector  multiplications  on  Mixh’i  matrices 

MULTI¬ 


FIGURE  3:  Multi-Channel  AO  Modulator/Deflector  Frequency- 
Multiplexed  AO  systolic  processor. 


Many  variations  of  the  basic  architecture  of  Figure  3  are  possible  and  obvious.  The  in¬ 
put  point  modulators  can  be  replaced  by  a  second  multi-channel  AO  cell.  The  second  dimen¬ 
sion  of  either  modulator  can  be  used  to  encode  the  data  in  digital  or  other  reprerentations . 
Alternatively,  the  system  can  be  made  to  perform  Mi  correlations  per  channel  (this  achieves 
Mj  digital  multiplications  on  Ni-bit  words) .  Finally,  partial  products  can  be  accunulatod  by 
time  integration  on  the  output  detectors.  All  of  these  techniques  provide  methods  to  in¬ 
crease  the  accuracy  of  the  system  and  reduce  dynamic  range  requirements  (at  the  expense  of  a 
reduced  number  of  mults/sec) .  Since  few  applications  require  the  large  number  of  operations 
possible  in  (9)  and  (10)  ,  such  tradeoffs  appear  quite  attractive  and  realistic.  A  detailed 
analysis  of  the  architecture  of  Figure  3  (or  similar  ones)  shows  that  such  architectures  are 
only  appropriate  for  matrix-matrix  multiplication,  operation  on  partitioned  larger-order  ma¬ 
trices,  or  similarly  more  complicated  linear  algebra  operations. 

3.  FORMAT  CONTROL  AND  DATA  FLOW  FOR  FLEXIBILITY 

As  noted  in  Section  1,  given  practical  problems  require  far  more  complex  operations  than 
a  VIP  or  even  a  M-V  multiplication.  In  Table  1,  we  list  various  operations,  the  associated 
matrix  and  vector  formatting  of  the  data  to  the  AO  cell  and  the  point  modulators,  plus  where 
the  output  produced  is  fed  back  to  achieve  more  complicated  operations  and  applications. 

Each  of  these  operations  has  been  fully  detailed  in  different  publications,  e.g.,  the  solu¬ 
tion  of  banded  matrices  (4) ,  the  solution  of  triangular  systems  of  equations  [12]  ,  general 
linear  algebraic  equation  (LAE)  solution  by  iterative  or  indirect  algorithms  [5] ,  matrix- 
matrix-matrix  (M-M-M)  multiplication  [5],  matrix  decomposition  [9,10]  for  direct  LAE  solu¬ 
tions,  etc.  An  attractive  feature  of  the  architecture  of  Figure  2  and  all  of  the  operations 
noted  in  Table  1  is  that  the  data  and  operations  flow  ideally  with  no  dead  time  in  the  sys¬ 
tem.  From  these  brief  remarks,  data  flow  and  format  control  are  seen  to  provide  consider¬ 
able  flexibility. 


TABLE  1.  Format  Control  or  Data  Flow  for  Flexibility  and  Data  Flow. 


OPERATION 

NOTATION 

ENCODING  (ROW,  COL) 

FEEDBACK 

TO 

APPLICATION 

AO  CELL 

POINT  MODS 

mm 

A  =  a ( t ,x) 

(1  row  per  TB) 

AO 

Solve  Banded  M-V 
and  Triangular  M-V 
(One  Detector) 

A  b 

A  *=  a(f,t) 

(1  col  per  Tb) 

Point 

Modulators 

Solve  LAE 

*1-M  Multiplication 

B  A 

WmS&Bm 

B  =  b(t,x) 

AO 

MMM  =  C  B  A 

H-M  Multiplication 

A  B 

A  ■=  a  ( f , t) 

B  =  b  (x,  t) 

AO 

MMM  =  ABC 

M  Decomposition 

M  Inversion 
Solve  M  Eqn 

■ 
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4.  OPTICAL  BEAM  DIVERGENCE  CONSIDERATIONS 

As  seen  in  Section  2,  the  bit  time  Tb  is  a  key  parameter  affecting  the  system  computation 
rate.  As  we  will  show  below,  Tb  also  quite  significantly  affects  fabrication.  The  center- 
to-center  spacing  TB  of  packets  of  data  in  the  AO  cell  should  be  largely  filled  with  the 
information  packet  (a  fill  ratio  of  0.5  is  quite  practical).  If  we  denote  the  physical  size 
along  the  AO  cell  (associated  with  Tg)  by  dB,  we  find 

dfi  -  vsTb,  (11) 

where  vs  is  the  velocity  of  sound  in  the  AO  cell.  For  Tb  *  O.lusec  (a  10MHz  data  rate  per 
channel) ,  dfl  *  62um  (for  a  Te02  AO  cell)  and  dB  «  657um  (for  a  LNB  AO  cell) .  These  quanti¬ 
tative  parameters  significantly  affect  fabrication  of  the  system.  For  Nj  *  200  and  Te02 
with  Ta  «=  40usec,  the  above  dB  parameters  are  appropriate.  Larger  Nj  values  (or  cells  with 
lower  Ta  values)  will  require  quite  smaller  dB  values  and  will  thus  introduce  quite  signifi¬ 
cant  practical  fabrication  problems. 

Even  with  the  above  dB  values,  typical  point  modulators  have  physical  sizes  or  center-to- 
center  spacings  ds  larger  than  the  required  dB.  Thus,  a  demagnification  of  the  input  point 
modulator  array  (by  a  factor  M)  is  required  when  imaging  the  input  sources  onto  the  AO  cell, 
i.e.  we  require 


dB  -  ds/M.  (12) 

Another  vital  and  practical  fabrication  issue  of  concern  is  the  divergence  ±0D  of  the  input 
light  incident  on  each  Tg  packet  of  data  in  the  AO  cell.  It  is  well-known  that  the  diver¬ 
gence  6j)  of  the  input  light  affects  the  frequency  resolution  of  an  AO  spectrum  analyzer.  In 
the  frequency-multiplexed  deflector  system,  this  affects  the  spacings  if  and  center  fre¬ 
quency  fc  of  the  data.  In  the  modulator  architecture,  this  affects  the  spacing  Tb  of  the 
data  bits  or  packets.  In  the  following  paragraphs,  we  quantify  the  effect  of  eD  on  the  per¬ 
formance  of  AO  systolic  processors  with  specific  attention  to  different  point  modulator 
choices  and  different  point  modulator  to  AO  cell  interconnection  techniques. 

All  AO  modulators  require  a  separation  of  the  zero  and  first-order  beams.  This  separation 
is  2eB  (where  eB  is  the  Bragg  angle) .  If  the  input  light  has  a  divergence  *>B,  the  zero  and 
first-order  beams  will  also  have  a  divergence  e^.  Thus,  separation  of  the  two  beams  requires 


Since  eB  satisfies 


26b  -  X0/A  -  Vc/Vs'  (14) 

where  is  the  optical  wavelength,  A  is  the  acoustic  wavelength  and  fc  is  the  center  fre¬ 
quency  of  the  AO  cell.  Thus,  we  require 


1  Vers¬ 


us) 


* 


Thus,  as  0£)  increases,  a  larger  fc  is  required.  Acoustic  attenuation  effects  now  increase. 
To  quantify  these  issues,  we  note  that  for  a  LNB  AO  cell  at  Xq  =  820nm 


2eD(max) 


2.3°  to  6.1° 


as  fc  varies  from  300  to  800MHz.  For  Te02,  as  fc  varies  from  40-100MHZ,  we  require 


26D (max) 


3.0°  -  7.6®. 


(16) 


(17) 


Next,  we  consider  the  effects  of  a  divergence  10D  or  A8  *  26D  on  the  frequency  resolution 
if  and  the  minimum  bit  separation  Tg.  A  beam  divergence  6  is  equivalent  to  a  spread  if  in 
the  input  RF  frequency  where 


Af 


2(Me')vs/X0 


26dWb- 


(18) 


where  6p  is  the  divergence  of  the  source  and  Me^  *  eD  is  the  divergence  of  the  optical  beam 
as  it  enters  the  AO  cell.  This  effect  in  (18)  limits  the  Af  between  multiplexed  frequencies. 
Similar  effects  appear  to  be  present  on  the  minimum  Tg  allowed.  In  the  conventional  proces¬ 
sors  of  Figure  1  (using  the  AO  cell  as  a  modulator) ,  the  nominal  Tg  is  set  by  TA/2  =  T  =  NTg, 
i.e.  N  packets  of  data  can  be  used  (where  2N * TBWP) .  However,  when  eD  is  included,  the  in¬ 
teraction  length  L,  the  size  of  the  AO  transducer,  the  thickness  of  the  AO  cell  and  the 
Bragg  sensitivity  all  enter.  In  general,  it  appears  that  the  TBWP  or  the  number  N  of  bit 
times  Tg  allowed  in  this  system  is  affected  similarly  by  the  presence  of  a  A6.  Specifically, 
the  Af  increase  reduces  the  number  of  resolvable  frequencies  to  BW/Af  (where  BW  is  the  band¬ 
width  of  the  device)  and  this  correspondingly  reduces  the  number  of  bit  times  allowable. 

Thus,  when  0p  is  present,  the  number  of  bit  times  allowed  in  the  AO  modulator  architectures 
is  also  reduced.  Since  the  frequency-multiplexed  architecture  uses  a  larger  Tg,  it  is 
far  less  susceptible  to  this  effect  than  are  the  AO  modulator  architectures  where  N  (ana 
hence  the  computation  rate)  are  directly  reduced  as  0p  effects  are  included. 

To  quantify  the  magnitude  of  this  effect,  we  note  that  for  Tr  *  O.luser  and  Xq  *  0.82um, 
we  find  for  LNB  that  0n  “  1°  “  17.45mrad  requires  a  if  *  250MHz  and  for  eD  •  3mrad  we  require 
Af  =«  47MHz.  For  Te02  (slow  shear),  0p  »  17.45mrad  corresponds  to  Af  -  25.2MHz  and  0D  *  3mrad 
corresponds  to  Af  «  4.5MHz.  For  LNB  and  Te02  cells  with  typical  bandwidths,  a  large  diver¬ 
gence  angle  of  1°  thus  has  quite  severe  effects.  As  noted  earlier,  these  effects  on  Tg  are 
comparable.  The  eD  and  Tg  effects  are  less  significant  for  the  frequency-multiplexed  AO 
modulator/def lector  architecture  however. 

5.  SOURCES,  INTERCONNECTIONS  AND  EXPERIMENTAL  RESULTS 

One  attractive  technique  for  demagnifying  a  linear  array  of  point  modulator  sources  (LEDs 
or  laser  diodes)  onto  the  AO  cell,  while  maintaining  low  divergence  8p  at  the  cell,  is  shown 
in  Figure  4.  As  depicted  in  this  figure,  the  point  modulators  are  first  focused  into  fiber 
optics  using  graded  index  (GRIN)  optical  elements  (Gl).  The  fiber  optic  (FO)  interconnec¬ 
tions  allow  the  source  spacings  (which  are  generally  quite  large)  to  be  reduced  to  the  cen- 
ter-to-center  spacing  of  the  GRIN  elements  (G2)  placed  at  the  opposite  end  of  the  FO  assembly 
as  shown  in  Figure  4.  The  primary  purpose  of  the  Gl  elements  is  to  provide  high-coupling 
efficiency  from  the  point  modulators  to  the  fibers.  The  primary  purpose  of  the  FO  link  is 
to  increase  the  packing  density  of  the  sources  and  to  reduce  the  center-to-center  source 
size.  The  G2  elements  have  the  function  of  producing  well  collimated  separate  optical 
light  channels  incident  on  the  AO  cell.  The  GRIN  optical  element  we  have  used  have  an  0.29 
pitch  and  a  1mm  diameter  (for  Gl)  and  an  0.25  pitch,  a  1mm  center-to-center  spacing  with  an 
active  optical  output  beam  diameter  of  0.4mm  and  a  f^  *  1.1mm  (for  G2) .  Such  an  intercon¬ 
nection  system  provides  parallel  output  beams  from  G2  from  separate  input  point  modulators 
with  a  corresponding  dg  «  1.0mm  and  an  active  beam  diameter  of  0.4mm.  These  parameters  are 
quite  compatible  with  the  requirements  for  several  of  our  AO  cell  systems.  Additional  beam 
reducing  optics  can  be  included  between  the  G2  outputs  and  the  light  to  the  AO  cell  (as 
shown  in  Figure  4)  to  further  reduce  the  center-to-center  spacing  (if  this  is  required  by 
the  AO  cell  and  its  Tg  or  dg  value) . 

The  portion  of  each  1.0mm  diameter  GRIN  lens  that  contains  light  is  the  active  source 
size  from  G2.  This  is  set  by  fj,  ■  1.1mm  of  G2  and  N.A.  ■  0.19  of  the  fiber.  With  multi- 
mode  fibers,  with  a  50um  core,  the  core  diameter  d  sets  the  divergence  from  G2.  For  such  a 
system,  we  find  an  active  source  diameter  (at  the  input  to  the  AO  cell)  of 

d'  «  2 (N.A. ) f.  a  400um 
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FIGURE  4:  GRIN  Optics  Source  and  AO  Cell  Coupling  Using  Fiber  Optics  (FO) . 
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FIGURE  5:  Multi-Channel  Point  Modulator  AO  Input  System  Architecture. 
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FIGURE  6:  Architecture  Using  Separate  Laser  Diodes  and  Individual  Collimating  Optics 


and  a  beam  divergence  given  by 


taneD  :  !c  »  (D/2)/f^  ■  25wm/l.lmm  *  22mrad.  (20) 

For  a  single-mode  fiber  (SMF)  with  a  6um  core,  we  obtain 


dg  s  350^m,  and  6D  s  3mrad.  (21) 

Our  experimental  tests  have  verified  all  of  the  above  theoretical  parameters  of  the  two  in¬ 
dicated  interconnection  architectures.  Other  experiments  we  performed  verified  the  associ¬ 
ated  theoretical  Af  associated  with  the  various  given  A6  values  noted  above. 

Other  possible  linear  point  source  alternatives  include  the  use  of  a  multi-channel  AO 
cell  (Figure  5)  and  the  use  of  laser  diodes  with  separate  individual  collimating  optics 
(Figure  6) .  Each  of  these  architectures  represents  most  attractive  alternatives  that  are 
appropriate  for  various  applications.  The  multi-channel  AO  input  cell  architecture  requires 
demagnification  optics.  It  has  the  advantages  of  a  very  low  divergence  angle;  however  its 
performance  is  generally  limited  by  the  optical  and  electrical  isolation  achievable  between 
the  separate  AO  channels.  The  use  of  such  an  input  to  an  optical  matrix-vector  processor  is 
thus  probably  restricted  (within  the  near-term)  to  systems  employing  data  encoding  for  re¬ 
duced  dynamic  range  and  improved  accuracy.  As  multi-channel  AO  devices  mature,  such  systems 
may  become  more  appropriate  for  analog  matrix-vector  applications.  The  use  of  separate 
laser  diodes  with  individual  collimating  optics  for  each  source  is  quite  attractive  since 
several  such  units  are  commercially  available.  The  divergence  angle  obtainable  from  such 
systems  appears  to  be  adequate  to  allow  simple  beam-reducing  optics  to  be  employed  (without 
the  need  for  the  GRIN-FO-GRIN  system  in  Figure  4) . 

6.  BIPOLAR  AND  COMPLEX-DATA  HANDLING 

The  issue  of  handling  bipolar  and  complex-valued  data  in  optical  systolic  processors  has 
often  not  been  detailed.  In  Figure  7,  we  show  a  new  architecture  that  is  appropriate  for 
such  data.  For  the  input  point  sources,  we  spatially-multiplex  two  linear  arrays  of  point 
modulators.  With  such  an  arrangement,  we  can  represent  bipolar  data  by  inputing  positive 
valued  vector  elements  on  one  input  array  and  negative  valued  vector  elements  on  the  other 
input  array.  Thus,  which  input  array  contains  non-zero  elements  will  determine  whether  the 
input  data  is  positive  or  negative  valued.  For  complex-valued  data,  three  linear  input 
arrays  would  be  employed.  As  shown  in  Figure  7,  the  light  from  each  input  array  passes 
through  the  AO  cell  at  a  different  angle  and  hence  the  matrix-vector  product  of  the  corres¬ 
ponding  input  vector  and  the  matrix  within  the  AO  cell  appears  on  a  separate  linear  output 
detector  array  (in  a  different  vertical  location) .  We  now  direct  attention  to  the  data  in¬ 
put  to  the  AO  cell  in  Figure  7.  In  this  figure,  we  show  three  multiplexed  frequency  inputs 
to  the  AO  cell.  These  can  be  used  to  represent  complex-valued  data  (by  encoding  such  data 
with  its  projections  on  the  0°,  120°  and  240®  projections  in  the  complex  plane).  For  the 
architecture  shown,  a  bipolar  input  vector  is  multiplied  by  a  complex-valued  matrix  and  the 
corresponding  matrix-vector  product  is  formed  on  separate  linear  output  detector  arrays. 

The  post-processing  required  to  convert  this  output  data  for  feedback  to  the  system  in  a 
compatible  form  is  quite  simple. 


7.  SUMMARY  AND  CONCLUSION 

In  this  paper,  we  have  described  many  practical  fabrication  issues  associated  with  opti¬ 
cal  systolic  array  processors.  The  performance  (mults/sec)  of  several  different  architec¬ 
tures  have  been  quantified  and  compared.  A  frequency-multiplexed  architecture  was  shown  to 
require  greatly  simplified  fabrication  and  to  yield  equivalent  performance  to  that  achieved 
on  other  architectures.  We  noted  that  by  format  control,  many  different  linear  algebraic 
operations  were  possible  on  the  same  architecture  and  that  all  such  operations  provide  quite 
ideal  data  and  operational  flow.  The  effects  of  the  bit  element  size  at  the  AO  cell  and  the 
divergence  of  the  optical  beam  entering  the  AO  cell  were  noted  and  quantified.  Three  new 
architectures  were  suggested  that  appear  appropriate  for  fabrication  of  a  realistic  and  prac¬ 
tical  optical  systolic  array  architecture.  These  systems  include  the  detailed  issues  of 
source  size,  source  spacing,  and  fill-ratio,  as  well  as  the  details  of  the  source-to-AO-cell 
coupling,  and  the  aforementioned  issues  of  bit  size  and  divergence.  Finally,  a  new  spatial 
and  frequency-multiplexed  architecture  was  described  to  allow  handling  of  complex-valued  and 
bipolar  matrix  and  vector  data. 
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17.  D.  CASASENT,  C.P.  Neuman  and  J.  Lycas,  "Optical  Kalman  Filtering  for  Missile  Guidance", 
Applied  Optics,  Vol.  23,  pp.  1960-1966,  July  1984. 

18.  J.  Jackson  and  D.  CASASENT,  "A  State  Estimation  Kalman  Filter  Using  Optical  Processing: 
Noise  Statistics  Known",  Applied  Optics,  Vol.  23,  pp.  376-378,  February  1984. 

19.  J.  Jackson  and  D.  CASASENT,  "Optical  Systolic  Array  Processor  Using  Residue  Arithmetic", 
Applied  Optics,  Vol.  22.  pp.  2817-2821,  September  1983. 

20.  D.  CASASENT,  A.  Ghosh  and  C.P.  Neuman,  "Iterative  Solutions  to  Nonlinear  Matrix 
Equations  Using  a  Fixed  Number  of  Steps",  Proc.  SPIE,  Vol.  495,  August  1984. 

21.  D.  CASASENT  and  J.  Jackson,  "Fabrication  Considerations  for  Acousto-Optic  Systolic 
Processors",  Proc.  SPIE,  Vol.  465,  pp.  104-112,  January  1984. 

22.  D.  CASASENT,  "Linear  Algebra  Techniques  for  Pattern  Recognition:  Feature  Extraction 
Case  Studies",  Proc.  SPIE,  Vol.  431,  pp.  263-269,  August  1983. 

23.  D.  CASASENT,  "Coherent  Optical  Pattern  Recognition:  A  Review",  Optical  Engineering,  24, 
Special  Issue,  January  1985. 
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16.  PUBLICATIONS  AND 
PRESENTATIONS 

16.1  PUBLICATIONS  (AFOSR  SUPPORTED,  1979-DATE) 

Publications  from  30  September  1979  -  30  September  1980  on  work  performed  under 
AFOSR-79-0091  are  listed  in  Section  16.1.1.  Publications  during  30  September  1980  -  30  September  1981 
follow  in  Section  16.1.2,  and  publications  in  FY82  and  FY83  continue  in  Sections  16.1.3  and  16.1.4.  New 
publications  from  September  1983  -  September  1984  follow  in  Section  16.1.5.  A  list  of  presentations  at 
conferences,  companies,  and  seminars  on  our  AFOSR  research  conducted  during  the  prior  year  then 
follow. 


16.1.1  PUBLISHED  PAPERS  UNDER  AFOSR  SUPPORT  f30  SEPTEMBER  1979  -  30 


SEPTEMBER  1980) 


1.  "Photo-DKDP  Light  Valve  in  optical  Data  Processing",  Applied  Cptics,  1£, 
3307-3314,  October  1979  (Casasent,  Luu) . 

2.  "Coherent  Optical  Pattern  Recognition",  Nikkei  Electronics,  150-181, 
October  1979  (in  Japanese)  (Casasent) . 

3.  "Optical  Data  Processing  for  Advanced  Missile  Guidance  Needs",  AIAA, 
October  1979  (Casasent) . 

4.  "Spread  Spectrum  Optical  Signal  Processors",  Proc.  EOSD,  333-342,  October 
1979  (Casasent,  psaltis) . 

5.  "Space  Blur  Bandwidth  Product  in  Correlator  Performance  Evaluation",  JOSA, 
70,  103-110,  January  1980  (Kumar,  Casasent). 

6.  "optical  Image  Processing",  EOSD,  Tokyo,  January  1980  (in  Japanese) 
(Casasent) . 

7.  "optical  Signal  Processing",  EOSD,  Tokyo,  January  1980  (in  Japanese) 
(Casasent) . 

8.  "Beyond  Matched  Filtering",  Opt.  Engr.,  19,  152-156,  March  1980  (Caulfield 
et  al) . 
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9.  "Multivariant  Technique  for  Multi-Class  Pattern  Recognition  ,  Applied 
Optics,  19,  1758-1761,  June  1980  (Psaltis,  Casasent)  . 

10.  "Optical  Fourier  Transform  Techniques  for  Advanced  Fourier  Spectroscopy 
Applied  optics,  19,  2034-2037,  June  1980  (Casasent,  Psaltis). 


11.  "Nonlinear  t-E  Curve  Effects  in  an  Optical  Correlator",  Opt .  Ctamnun . ,  34 
4-6,  July  1980  (Kumar,  Casasent). 

12.  "Correlation  of  Images  with  Random  Contrast  Reversals",  SPIE,  238,  156- 

165,  July  1980  (Barniv,  Mostafavi,  Casasent).  —  — — 

13.  "A  Laser  Diode  Lensless  MSF-HOE  correlator".  Applied  Optics.  19,  2653- 

2654,  August  1980  (Caimi  et  al) .  ™ 


18.1.2  PUBLISHED  PAPERS  UNDER  AFOSR  SUPPORT  (80  SEPTEMBER  1980  -  80 
SEPTEMBER  1981) 


4.  "Hybrid  Processor  to  Compute  Invariant  Moments  for  Pattern  Recognition", 
Opt.  Lett.,  395-397,  September  1980  (Casasent,  Psaltis). 

5.  "optical  Word  Recognition,  Case  Study  in  Coherent  Optical  Pattern 
Recognition",  Opt.  Enqr . ,  19,  716-721,  September  1980  (Casasent  et  al) . 

6.  "Lensless  Matched  Spatial  Filter  Correlator  Experiments",  Opt.  Conjnun., 

34,  311-315,  September  1980  (M.  Shen  et  al) .  “ 

7.  "HOE/Lensless  Matched  Spatial  Filter  Correlator  Experiments",  Opt.  Cornu n 

34,  316-320,  September  1980  (M.  Shen  et  al) .  ”  “ 


18.  "A  Laser  Diode/Lensless  MSF  Optical  Pattern  Recognition  System",  EOSD,  46- 
52,  Novt..iber  1980  (Casasent  et  al)  . 

19.  "Optical  Pattern  Recognition:  Matched  Spatial  Filter  Processors",  EOSD, 
33-39,  November  1980  (Casasent) . 

20.  "Optical  Pattern  Recognition:  Beyond  Matched  Spatial  Filtering",  EOSD, 
39-47,  March  1981  (Casasent). 

21.  "Pattern  Recognition:  A  Review",  IEEE  Spectrum,  28-33,  March  1981 
(Casasent) . 

22.  "Processing  Flexibility  by  Hybrid  Optical/Digital  Techniques",  Proc.  Work¬ 
shop  of  Future  Directions  in  Optical  Data  Processing,  Texax  Tech.  Rept., 

1  March  1981,  17-23  (Casasent,  Kumar). 

23.  "Beyond  Holographic  Matched  Filtering",  Israel  Journal  of  Technology,  18, 
255-260,  March  1981  (Casasent) . 

24.  "Binarization  Effects  in  a  Correlator  with  Noisy  Input  Data",  Applied  Optics 
20,  1433-1438,  April  1981  (Kumar,  Casasent). 

25.  "Correlation  of  Images  with  Random  Contrast  Reversals",  SPIE,  238,  156- 
165,  July  1980  (Barniv,  Mostafavi,  casasent). 

26.  "Image  Quality  Effects  in  Optical  Correlators",  SPIE,  310,  183-192,  August 
1981  (Casasent,  Eiva,  Kumar). 

27.  "Multisensor  Image  Registration:  Experimental  Verification",  SPIE,  292, 

160-171,  August  1981  (Barniv,  Casasent).  “~~ 

28.  "Intra-Class  IR  Tank  Pattern  Recognition  Using  SDFs",  SPIE,  292,  25-33, 
August  1981  (Hester,  Casasent) . 

29.  "Inter-Class  Discrimination  Using  SDFs",  SPIE,  302,  108-116,  August  1981 
(Hester,  Casasent) . 


18. 1.3  PUBLISHED  PAPERS  UNDER  AFOSR  SUPPORT  (30  SEPTEMBER  1981  -  30 
SEPTEMBER  1982) 


30.  "An  Iterative  ^>tical  Processor:  Selective  Survey  of  Operations  Achievable" 
Proceedings  NASA  Langley  Conference  on  Optical  Information  Processing, 
Publication  2207,  August  1981,  105-118  (Casasent,  Neuman). 
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31.  "A  Review  of  Optical  Signal  Processing",  IEEE  Commun . ,  40-48,  September 
1981  (Casasent) . 

32.  "Optical  Signal  Processing  II:  Applications,  Systems  and  New  Techniques", 
EOSD,  41-47,  September  1981  (Casasent). 

33.  "The  Soviet  Priz  Spatial  Light  Modulator",  Applied  Optics,  20,  3090-3092, 
September  1981  (Casasent,  Caimi,  Khomenko) . 

34.  "A  Laser  Diode/HOE  Pattern  Recognition  System",  Acta  Optica  Sinica,  1_, 
401-410,  September  1981  (Casasent  et  al) . 

35.  "Eigenvector  Determination  by  Iterative  Optical  Methods",  Applied  Optics, 
20,  3707-3710,  November  1981  (Kuraar,  Casasent). 

36.  "A  New  Soviet  BSO  Light  Modulator  for  Optical  Data  Processing",  Proc.  EOSD, 
297-303,  November  1981  (Casasent,  Caimi). 

37.  "A  Correlator  for  optimum  Two-Class  Discrimination",  Proc.  EOSD,  321-330, 
November  1981  (Casasent  et  al). 


38.  "Test  and  Evaluation  of  the  Soviet  Prom  and  Priz  Spatial  Light  Modulators", 
Applied  Optics,  20,  4215-4220,  December  1981  (Casasent,  Caimi,  Khomenko). 

39.  "A  Microprocessor-Based  Fiber-Optic  Iterative  Optical  Processor",  Applied 
Optics,  21,  147-152,  January  1982  (Car lotto,  Casasent). 

40.  "Principal  Component  Imagery  for  Statistical  Pattern  Recognition  Correlators", 
Qpt.  Engr.,  21,  43-47,  January /February  1982  (Kumar,  Casasent). 

41.  "Adaptive  Phased  Array  Radar  Processing  Using  an  Optical  Matrix-Vector 
Processor",  SPIE,  341 ,  May  1982  (casasent,  Car lotto ) . 


42.  "New  Research  in  Holographic  Pattern  Recognition",  Proc.  SPIE,  353 ,  6-11, 
August  1982  (Casasent) . 

43 .  "Synthetic  Discriminant  Functions  for  3-D  Object  Recognition",  Proc.  SPIE, 
360,  136-142,  August  1982  (Casasent,  Kumar,  Sharma) . 

44.  "Multidimensional  Adaptive  Radar  Array  Processing  Using  an  Iterative  Optical 
Matrix-Vector  Processor",  Opt ,  bngr . ,  2 1 ,  814-821,  September  1982  (Casasent, 
Carlotto) . 
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16.1.4  PUBLISHED  PAPERS  UNDER  AFOSR  SUPPORT  (30  SEPTEMBER  1982  -  30 
SEPTEMBER  1983) 

45.  "Advanced  Acousto-Optic  Signal  Processors",  Proc.  SPIE,  352 ,  50-58,  August 
1982  (Casasent). 

46.  "A  Fisher  Discriminant  Approach  to  Distortion-Invariant  Pattern  Recognition 
Using  Autocorrelations",  Lasers  and  Electro-Optics,  34,  18-23,  September 
1982  (Casasent,  Chang). 

47.  "Realization  of  a  Sobel  Operator  by  Coherent  Optical  Techniques",  Lasers  and 
Electro-Optics ,  34 ,  24-30,  September  1982  (Chen,  Casasent). 

48.  "Applications  of  the  Priz  Light  Modulator",  Applied  Optics,  21 ,  3846-3854, 
November  1982  (Casasent,  Caimi,  Petrov,  Khomenko) . 

49.  "Frequency-Multiplexed  and  Pipelined  Iterative  Optical  Systolic  Array  Pro¬ 
cessors",  Applied  Optics,  22 ,  115-124,  January  1983  (Casasent,  Jackson, 
Neuman) . 

50.  "Optical  Linear  Algebra",  SPIE ,  388 ,  January  1983  (Casasent,  Ghosh). 

51.  "Nonlinear  Local  Image  Preprocessing  Using  Coherent  Optical  Techniques", 
Applied  Optics,  22,  808-814,  March  1983  (Casasent,  Chen). 

52.  ■Performance  of  Synthetic  Discriminant  Functions  for  Infrared  Ship  Classification ■ ,  IOCC 

Conference,  Boston,  Massachusetts,  April  1983,  IEEE  Cat.  No.  CH1880-4/83,  SPIE  Vol. 

422,  pp.  193-196  (CASASENT,  Sharma). 

53.  ■Guidelines  for  Efficient  Use  of  Optical  Systolic  Array  Processors",  IOCC  Conference,  Boston, 

Massachusetts,  April  6-8,  1983,  IEEE  Cat.  No.  CH1880-4/83,  SPIE  Vol.  422,  pp. 

209-213  (CASASENT). 

54.  "Recent  Advances  in  Optical  Signal  Processing",  CLEO  Conference,  May  17-20,  1983,  Baltimore, 

Maryland  (CASASENT). 

55.  "Developments  in  Acousto  Optic  Signal  Processing",  Trends  and  Perspectives  in  Signal  Processing, 

Vol.  3,  No.  2,  pp.  1-6,  June  1983  (CASASENT). 

56.  "Generalized  Chord  Transformation  for  Distortion-Invariant  Optical  Pattern  Recognition",  Applied 

Optics.  22^  pp.  2087-2094,  July  1983  (CASASENT,  Chang). 

57.  "LU  and  Cholesky  Decomposition  on  an  Optical  Systolic  Array  Processor",  Optics  Communications, 

46^  pp.  270-273,  July  1983  (CASASENT,  Ghosh). 

58.  "Direct  and  Indirect  Optical  Solutions  to  Linear  Algebraic  Equations:  Error  Source  Modeling",  Proc. 

SPIE,  431,  pp.  201-208,  August  1983  (CASASENT,  Ghosh,  Neuman). 

59.  "Linear  Algebra  Techniques  for  Pattern  Recognition:  Feature  Extraction  Case  Studies",  SPIE.  431 . 

pp.  263-269,  August  1983  (CASASENT). 
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60.  "Shift-Invariant  and  Distortion-Invariant  Object  Recognition",  SPIE,  442,  pp.  47-55,  August  1983 
(CASASENT,  Sharma). 


16.1.5  PAPERS  PUBLISHED  AND  SUBMITTED  UNDER  AFOSR  SUPPORT 
(SEPTEMBER  1983  -  SEPTEMBER  19841 

61.  "Fourier  Transform  Feature-Space  Studies",  Proc.  SPIE,  449,  pp.  2-8,  November  1983  (CASASENT, 

Sharma). 

62.  "Direct  and  Implicit  Optical  Matrix-Vector  Algorithms",  Applied  Optics,  22,  pp.  3572-3578, 

November  1983  (CASASENT,  Ghosh). 

63.  "Optical  Kalman  Filtering  for  Missile  Guidance",  ICALEO’83,  Laser  Institute  of  America,  41,  pp. 

70-78,  Los  Angeles,  California,  November  1983,  (CASASENT,  Neuman,  Lycas). 

64.  "Recent  Advances  in  Optical  Pattern  Recognition",  Proc.  SPIE,  456,  January  1984  (CASASENT, 

Fetterly). 

65.  "Fabrication  Considerations  for  Acousto-Optic  Systolic  Processors",  Proc.  SPIE,  465,  pp.  104-112, 

January  1984  (CASASENT,  Jackson). 

66.  "A  State  Estimation  Kalman  Filter  Using  Optical  Processing:  Noise  Statistics  Known",  Applied 

Optics,  23,  pp.  376-378,  February  1984  (Jackson,  CASASENT). 

67.  "Unified  Synthetic  Discriminant  Function  Computational  Formulation",  Applied  Optics,  23,  pp. 

1620-1627,  May  1984  (CASASENT). 

68.  "Direct  and  Implicit  Optical  Matrix-Vector  Algorithms:  Addendum",  Applied  Optics,  23,  p.  1450, 

May  1984  (CASASENT,  Ghosh). 

69.  "Acousto-Optic  Linear  Algebra  Processors:  Architectures,  Algorithms  and  Applications",  Proc. 

IEEE,  Special  Issue  on  Optical  Computing,  72,  pp.  831-849,  July  1984  (CASASENT). 

70.  "Optical  Kalman  Filtering  for  Missile  Guidance",  Applied  Optics,  23.  pp.  1960-1966,  July  1984 

(CASASENT,  Neuman,  Lycas). 

71.  "Time-Integrating  Acousto-Optic  Correlator:  Error  Source  Modeling",  Applied  Optics,  23,  pp. 

3230-3237,  September  1984  (CASASENT,  Goutzoulis,  Kumar). 

72.  "Acousto-Optic  Processor  for  Adaptive  Radar  Noise  Environment  Characterization",  Accepted  for 

publication,  Applied  Optics,  1984  (Goutzoulis,  CASASENT,  Kumar). 

73.  "Feature  Extractors  for  Distortion-Invariant  Robot  Vision",  Optical  Engineering,  23,  pp.  492-498, 

October  1984  (CASASENT,  Sharma). 

74.  "Projection  Synthetic  Discriminant  Function  Performance",  Optical  Engineering,  23,  pp.  716-720, 

November  1984  (CASASENT,  Rozzi,  Fetterly). 

75.  "A  Quadratic  Ma'rix  Algorithm  for  Linear  Algebra  Processors",  Submitted,  IEEE  Trans.  SMC, 

Submitted,  August  1984  (CASASENT,  Ghosh,  Neuman). 


76.  "Image  Segmentation  and  Real-Image  Tests  for  an  Optical  Moment-Based  Feature  Extractor", 

Optics  Communications,  51,  pp.  227-230,  September  1984  (CASASENT,  Cheatham). 

77.  "Hierarchical  Pattern  Recognition  Using  Parallel  Feature  Extraction",  Proc.  ASME,  August  1984 

(CASASENT,  Cheatham). 

78.  "Hierarchical  Fisher  and  Moment-Based  Pattern  Recognition",  Proc.  SPIE,  504,  August  1984 

(Cheatham,  CASASENT). 

79.  “Iterative  Solutions  to  Nonlinear  Matrix  Equations  Using  a  Fixed  Number  of  Steps",  Proc.  SPIE, 

495,  August  1984  (CASASENT,  Ghosh,  Neuman). 

80.  "SDF  Control  of  Correlation  Plane  Structure  for  3-D  Object  Representation  and  Recognition",  Proc 

SPIE,  507,  August  1984  (Chang,  CASASENT,  Fetterly). 

81.  "Iterative  Optical  Vector-Matrix  Processor",  SPIE,  373,  111-116,  February  1981  (Carlotto, 

CASASENT). 

82.  "Optical  Linear  Algebra  Processors:  Noise  and  Error  Source  Modeling",  Optics  Letters,  Submitted 

September  1984,  (CASASENT,  Ghosh). 


16.2  SEMINARS,  CONFERENCES.  ETC.  PRESENTATIONS  OF  AFOSR 
RESEARCH  (1  SEPTEMBER  1983  -  30  SEPTEMBER  1984) 

October  1983 

1.  Washington,  D.C.,  "Acousto-Optic  Research  Possibilities". 

2.  DARPA  -  Washington,  D.C.,  "Advanced  Optical  Pattern  Recognition  Algorithms, 
Architectures,  and  Systems". 

3.  Carnegie-Mellon  University,  Sophomore  Seminar  -  Pittsburgh,  Pennsylvania,  "Optical 
Information  Processing". 


November  1983 

4.  SPIE  Conference  -  Cambridge,  Massachusetts,  "Fourier  Transform  Feature-Space  Studies". 

5.  SPIE  Conference  -  Cambridge,  Massachusetts,  "Direct  and  Implicit  Optical  Matrix-Vector 
Algorithms". 

6.  Laser  Institute  of  America  Conference  -  Los  Angeles,  California,  "Optical  Kalman  Filtering  for 
Missile  Guidance". 

7.  VOIS  Inc.  -  Binghamton,  New  York,  "Optical  Pattern  Recognition". 

8.  Carnegie-Mellon  University,  ECE  Department  -  Pittsburgh,  Pennsylvania  -  "Optical 
Information  Processing". 
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9.  Carnegie-Mellon  University,  Presented  to  NASA  Lewis  -  Pittsburgh,  Pennsylvania,  "Optical 
Linear  Algebra*. 

December  1983 

10.  Stanford  University  -  Stanford,  "Optical  Systolic  Processors". 

11.  Chevron  Oil  Field  Research  Co.  -  La  Habra,  California,  "Optical  Information  Processing". 

12.  University  of  California  at  Santa  Barbara  -  Santa  Barbara,  California,  "Optical  Information 
Processing". 

January  1984 

13.  SPIE  Conference  -  Los  Angeles,  California,  "Recent  Advances  in  Optical  Pattern 
Recognition  ■ . 

14.  Teledyne  Electronics  -  Newbury  Park,  California,  "Optical  Signal  Processing". 

15.  SPIE  Conference  -  Los  Angeles,  California,  "Fabrication  Considerations  for  Acousto-Optic 
Systolic  Processors*. 


February  1984 

16.  Polytechnic  Institute  -  Brooklyn,  New  York,  "Optical  Processing  for  Robotics". 

17.  Robotics  Institute,  Carnegie-Mellon  University  -  Pittsburgh,  Pennsylvania,  "Optical 
Information  Processing". 

March  1984 

18.  Washington,  D.C.,  "Optical  Data  Processing". 

19.  Carnegie-Mellon  University,  Professional  Education  Program  -  Pittsburgh,  Pennsylvania, 
■Optical  Pattern  Recognition*. 

20.  Carnegie-Mellon  University,  Professional  Education  Program  -  Pittsburgh,  Pennsylvania, 
•Optical  Signal  Processing". 


April  1984 

21.  Carnegie-Mellon  University,  Professional  Education  Progra  Pittsburgh,  Pennsylvania, 
■Optical  Information  Processing". 

May  1984 

22.  Air  Force  Office  of  Scientific  Research  -  Washington,  D.C.,  "Optical  Information  Processing". 

23.  NASA  Langley  -  Hampton,  Virginia,  "Optical  Linear  Algebra". 
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June  1884 

24.  Carnegie-Mellon  University,  Presented  to  Westinghouse  R  &  D  -  Pittsburgh,  Pennsylvania, 
■Center  for  Excellence  in  Optical  Data  Processing". 

August  1884 

25.  SPIE  Conference  -  San  Diego,  California  -  “Iterative  Solutions  to  Nonlinear  Matrix  Equations 
Using  a  Fixed  Number  of  Steps". 

26.  SPIE  Conference  -  San  Diego,  California  -  "Hierarchical  Fisher  and  Moment-Based  Pattern 
Recognition  " . 

27.  SPIE  Conference  -  San  Diego,  California  -  "SDF  Control  of  Correlation  Plane  Structure  for 
3-D  Object  Representation  and  Recognition". 

September  1984 

28.  Philips  Laboratories  -  Briarcliff,  NY  -  "Optics  and  Pattern  Recognition  in  Robotics". 


29.  Optical  Society  of  America  -  Pittsburgh,  PA,  "CMU  Center  for  Excellence  in  Optical  Data 
Processing". 

30.  Carnegie-Mellon  University  -  Pittsburgh,  PA,  "Signals  and  Systems  Research  in  ECE". 

31.  Westinghouse  Corporation  -  Baltimore,  MD,  "Center  for  Excellence  in  Optical  Data 
Processing". 


16.3  THESES  SUPPORTED  BY  AFOSR  FUNDING  (SEPTEMBER  1980  - 
SEPTEMBER  1984) 


1.  Hiroyasu  Murakami,  M.S.  Dissertation,  "Matched  Filter  Statistical  Correlator  (February 
1981). 

2.  Saulius  Eiva,  M.S.  Dissertation,  "Image  Quality  Effects  in  Optical  Correlators"  (May  1981). 

3.  Charles  Hester,  PhD  Dissertation,  "Synthetic  Filters  for  Multi-Class  Pattern  Recognition* 
(May  1981). 

4.  Yair  Barniv,  PhD  Dissertation,  "Multi-Sensor  Image  Registration"  (May  1981). 

5.  Mark  Carlotto,  PhD  Dissertation,  "Iterative  Electro-Optic  Matrix  Processor"  (May  1981). 

6.  Andrew  Sexton,  M.S.  Dissertation,  "Digital  Analysis  of  Space-Variant  Optical  Processors" 
(July  1981). 

7.  Bernard  Szymanski,  M.S.  Dissertation,  "A  Computer-Controlled  Film  Recorder  for  Optical 
Processing"  (July  1983). 
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8.  Vinod  Sharma,  PhD  Dissertation,  “Design  and  Analysis  of  Algorithms  for  Distortion-Invariant 
Object  Recognition"  (January  1985). 

9.  R.  Lee  Cheatham,  PhD  Dissertation,  “Moment-Based  Object  Recognition  Using  a  Two-Level 
Classifier"  (April  1984). 

10.  Anjan  Ghosh,  PhD  Dissertation,  “Performance  Evaluation  of  Optica]  Linear  Algebra 
Processors"  (April  1984). 

11.  Eugene  Pochapsky,  M.S.  Dissertation,  “The  Simulation  of  Optical  Pattern  Recognition 
Systems"  (August  1984). 

12.  William  Rozzi,  M.S.  Dissertation,  “New  Distortion- Invariant  Correlator  Research*  (Expected 
in  December  1984). 

13.  Bruce  Thomas,  M.S.  Dissertation,  “Moments  for  Distortion  Parameter  Estimation*  (Expected 
in  December  1984). 

14.  Wen-Thong  Chang,  PhD  Dissertation,  "Shift-Invariant  and  Distortion-Invariant  Pattern 
Recognition  Techniques"  (Expected  in  February  1985). 

16.4  PATENT  DISCLOSURES  (SEPTEMBER  1980  -  SEPTEMBER  1984) 

1.  Multiple-Invariant  Space-Variant  Pattern  Recognition  System. 

2.  Pattern  Recognition  by  Invariant  Moments. 

3.  Synthetic  Discriminant  Functions  for  Multi-Class  Pattern  Recognition. 

4.  Equalization  and  Coherent  Measure  Correlator. 

5.  Multi-Variant  Technique  for  Multi-Class  Pattern  Recognition. 


-  ^  -  -  • 

END 
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